GitGenius is a powerful tool designed to clone a GitHub repository, read all files, and create a document retrieval system using FAISS and OpenAI embeddings. This project leverages LangChain for natural language query processing to provide relevant information from the cloned repository.
- Repository Cloning: Clone any GitHub repository into a local directory.
- File Reading: Read and preprocess all files within the cloned repository.
- Embeddings Generation: Utilize OpenAI embeddings to create document embeddings.
- Vector Storage: Store embeddings in a FAISS vector store for efficient retrieval.
- Document Retrieval: Retrieve relevant documents based on user queries.
- Python 3.8 or higher
- Git
- An OpenAI API key
- Clone the repository:
git clone https://github.com/your-username/GitGenius.git
cd GitGenius
- Install the required dependencies:
pip install -r requirements.txt
- Set your OpenAI API key in an
.env
file:
OPENAI_API_KEY=your-openai-key
- Set up the project environment and directories:
make setup
The project includes a Makefile with useful targets:
- install: Installs dependencies listed in requirements.txt.
- run: Executes the main Python script.
- setup: Creates necessary directories.
- clean: Deletes cloned_repo and vector_database.
You can also install GitGenius using the setup.py
script:
python setup.py install
-
Clone a repository:
The script clones the specified repository and reads all files within it.
-
Process documents:
The contents of the files are processed, and embeddings are generated using OpenAI embeddings.
-
Create and save a FAISS vector store:
The embeddings are stored in a FAISS vector store, which is saved locally.
-
Retrieve documents:
Use natural language queries to retrieve relevant documents from the vector store.
To run the example provided in the main.py script, simply execute:
python main.py
or
make run
This will clone the repository specified in repo_url, process the documents, and perform a query to demonstrate the retrieval capabilities.
If you want to clean up all generated files and directories from the project (excluding the cloned repository):
make clean
This project is licensed under the MIT License. See the LICENSE file for more details.