DocBot is a smart documentation assistant built in Python using various AI models. It helps you interact with your PDF documentation and answer questions related to the content of the PDF.
- Extract text from PDF documents.
- Perform question-answering tasks on the extracted text.
- Find the most relevant image based on a query.
- Display the relevant image along with the answer to the user's query.
- Create a Virtual Environment
It is recommended to create a virtual environment to manage dependencies:
python -m venv venv
source venv/bin/activate
- Install Dependencies
Install the required packages using pip:
pip install -r requirements.txt
Open the .env file in an editor and add the following lines
OPENAI_API_KEY=your_openai_api_key
GEMINI_VISION_API_KEY=your_gemini_vision_api_key
- Run Streamlit: Navigate to the folder containing the code and run the following command in terminal:
streamlit run final_model.py
-
Upload PDF: Once Streamlit server is running, a browser window will open. Upload your PDF file using the provided file uploader.
-
Add Your Prompt: Enter your query or prompt related to the content of the uploaded PDF in the chat interface.
-
Wait for Results: DocBot will process your prompt, extract relevant information, generate captions for images, and display the most relevant image along with the answer to your query.
Note:
- Image generation and captioning processes may take some time as we utilize APIs for YOLO(roboflow) and Gemini Pro Vision.
- You can find the images generated by DocBot in the 'uploads' folder of the project directory.
- The cosine similarity score between image captions and your prompt will be displayed on your terminal as DocBot processes your query.
DocBot utilizes the following APIs and models:
-
Gemini Pro Vision: Used for image captioning, providing descriptive captions for images extracted from the PDF.
-
GPT-3.5-Turbo: Employs GPT-3.5 for text-based question-answering tasks, extracting relevant information from the PDF.
-
YOLO(from Roboflow): A custom-trained YOLO V5 model(on a PDF, which was custom trained on bounding boxes) specifically designed for image detection within PDF documents.
Find a video demonstration here (first 3 minutes are explanation through PPT followed by actual demonstration)