DocBot

DocBot is a smart documentation assistant built in Python using various AI models. It helps you interact with your PDF documentation and answer questions related to the content of the PDF.

Features

Extract text from PDF documents.
Perform question-answering tasks on the extracted text.
Find the most relevant image based on a query.
Display the relevant image along with the answer to the user's query.

Instructions to run app

Create a Virtual Environment

It is recommended to create a virtual environment to manage dependencies:

python -m venv venv
source venv/bin/activate

Install Dependencies

Install the required packages using pip:

pip install -r requirements.txt

Open the .env file in an editor and add the following lines

OPENAI_API_KEY=your_openai_api_key
GEMINI_VISION_API_KEY=your_gemini_vision_api_key

Run Streamlit: Navigate to the folder containing the code and run the following command in terminal:

streamlit run final_model.py

Upload PDF: Once Streamlit server is running, a browser window will open. Upload your PDF file using the provided file uploader.
Add Your Prompt: Enter your query or prompt related to the content of the uploaded PDF in the chat interface.
Wait for Results: DocBot will process your prompt, extract relevant information, generate captions for images, and display the most relevant image along with the answer to your query.

Note:

Image generation and captioning processes may take some time as we utilize APIs for YOLO(roboflow) and Gemini Pro Vision.
You can find the images generated by DocBot in the 'uploads' folder of the project directory.
The cosine similarity score between image captions and your prompt will be displayed on your terminal as DocBot processes your query.

APIs Used

DocBot utilizes the following APIs and models:

Gemini Pro Vision: Used for image captioning, providing descriptive captions for images extracted from the PDF.
GPT-3.5-Turbo: Employs GPT-3.5 for text-based question-answering tasks, extracting relevant information from the PDF.
YOLO(from Roboflow): A custom-trained YOLO V5 model(on a PDF, which was custom trained on bounding boxes) specifically designed for image detection within PDF documents.

Demonstration

Find a video demonstration here (first 3 minutes are explanation through PPT followed by actual demonstration)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.env		.env
.gitignore		.gitignore
DOCBOT.pptx		DOCBOT.pptx
README.md		README.md
final_model.py		final_model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocBot

Features

Instructions to run app

APIs Used

Demonstration

About

Releases

Packages

Languages

g-agan/docbot_p28_A

Folders and files

Latest commit

History

Repository files navigation

DocBot

Features

Instructions to run app

APIs Used

Demonstration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages