Skip to content

g-agan/docbot_p28_A

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocBot

DocBot is a smart documentation assistant built in Python using various AI models. It helps you interact with your PDF documentation and answer questions related to the content of the PDF.

Features

  • Extract text from PDF documents.
  • Perform question-answering tasks on the extracted text.
  • Find the most relevant image based on a query.
  • Display the relevant image along with the answer to the user's query.

Instructions to run app

  1. Create a Virtual Environment

It is recommended to create a virtual environment to manage dependencies:

python -m venv venv
source venv/bin/activate
  1. Install Dependencies

Install the required packages using pip:

pip install -r requirements.txt

Open the .env file in an editor and add the following lines

OPENAI_API_KEY=your_openai_api_key
GEMINI_VISION_API_KEY=your_gemini_vision_api_key
  1. Run Streamlit: Navigate to the folder containing the code and run the following command in terminal:
streamlit run final_model.py
  1. Upload PDF: Once Streamlit server is running, a browser window will open. Upload your PDF file using the provided file uploader.

  2. Add Your Prompt: Enter your query or prompt related to the content of the uploaded PDF in the chat interface.

  3. Wait for Results: DocBot will process your prompt, extract relevant information, generate captions for images, and display the most relevant image along with the answer to your query.

Note:

  • Image generation and captioning processes may take some time as we utilize APIs for YOLO(roboflow) and Gemini Pro Vision.
  • You can find the images generated by DocBot in the 'uploads' folder of the project directory.
  • The cosine similarity score between image captions and your prompt will be displayed on your terminal as DocBot processes your query.

APIs Used

DocBot utilizes the following APIs and models:

  • Gemini Pro Vision: Used for image captioning, providing descriptive captions for images extracted from the PDF.

  • GPT-3.5-Turbo: Employs GPT-3.5 for text-based question-answering tasks, extracting relevant information from the PDF.

  • YOLO(from Roboflow): A custom-trained YOLO V5 model(on a PDF, which was custom trained on bounding boxes) specifically designed for image detection within PDF documents.

Demonstration

Find a video demonstration here (first 3 minutes are explanation through PPT followed by actual demonstration)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%