Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
svilupp committed Aug 24, 2024
1 parent ef83cde commit fe3db67
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 33 deletions.
27 changes: 13 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

## DocsScraper: "A document scraping and parsing tool used to create a custom RAG database for AIHelpMe.jl"
## DocsScraper: "Efficient RAG knowledge pack creator from online Julia documentation"
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://juliagenai.github.io/DocsScraper.jl/dev/) [![Build Status](https://github.com/JuliaGenAI/DocsScraper.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/JuliaGenAI/DocsScraper.jl/actions/workflows/CI.yml?query=branch%3Amain) [![Coverage](https://codecov.io/gh/JuliaGenAI/DocsScraper.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/JuliaGenAI/DocsScraper.jl) [![Aqua](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)


Expand All @@ -15,27 +15,27 @@ It scrapes and parses the URLs and with the help of PromptingTools.jl, creates a

## Installation

To install DocsScraper, use the Julia package manager and the package name:
To install DocsScraper, use the Julia package manager and the package name (it's not registered yet):

```julia
using Pkg
Pkg.add("DocsScraper")
Pkg.add(url="https://github.com/JuliaGenAI/DocsScraper.jl")
```


**Prerequisites:**

- Julia (version 1.10 or later).
- Internet connection for API access.
- OpenAI API keys with available credits. See [How to Obtain API Keys](#how-to-obtain-api-keys).
- OpenAI API keys with available credits. See [How to Obtain API Keys](https://svilupp.github.io/PromptingTools.jl/dev/frequently_asked_questions#Creating-OpenAI-API-Key).


## Building the Index
```julia
crawlable_urls = ["https://juliagenai.github.io/DocsScraper.jl/dev/home/"]

index_path = make_knowledge_packs(crawlable_urls;
index_name = "docsscraper", embedding_dimension = 1024, embedding_bool = true, target_path=joinpath(pwd(), "knowledge_packs"))
index_name = "docsscraper", embedding_dimension = 1024, embedding_bool = true, target_path="knowledge_packs")
```
```julia
[ Info: robots.txt unavailable for https://juliagenai.github.io:/DocsScraper.jl/dev/home/
Expand Down Expand Up @@ -73,14 +73,12 @@ a docsscraper__v20240823__textembedding3large-1024-Bool__v1.0.hdf5
```julia
using AIHelpMe
using AIHelpMe: pprint, load_index!

# Either use the index explicitly
aihelp(index_path, "what is DocsScraper.jl?")
# set it as the "default" index, then it will be automatically used for every question
load_index!(index_path)

# or set it as the "default" index, then it will be automatically used for every question
AIHelpMe.load_index!(index_path)

pprint(aihelp("what is DocsScraper.jl?"))
aihelp("what is DocsScraper.jl?") |> pprint
```
```julia
[ Info: Updated RAG pipeline to `:bronze` (Configuration key: "textembedding3large-1024-Bool").
Expand All @@ -96,8 +94,9 @@ PromptingTools.jl, creates a vector store that can be utilized in RAG (Retrieval
AIHelpMe.jl and PromptingTools.jl to provide efficient and relevant query retrieval, ensuring that the responses generated by the system are specific to the content in the created database.
```
Tip: Use `pprint` for nicer outputs with sources
Tip: Use `pprint` for nicer outputs with sources and `last_result` for more detailed outputs (with sources).
```julia
using AIHelpMe: pprint, last_result
print(last_result)
using AIHelpMe: last_result
# last_result() returns the last result from the RAG pipeline, ie, same as running aihelp(; return_all=true)
print(last_result())
```
25 changes: 12 additions & 13 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@

## DocsScraper: "A document scraping and parsing tool used to create a custom RAG database for AIHelpMe.jl"
# DocsScraper

DocsScraper is a package designed to create "knowledge packs" from online documentation sites for the Julia language.

It scrapes and parses the URLs and with the help of PromptingTools.jl, creates an index of chunks and their embeddings that can be used in RAG applications. It integrates with AIHelpMe.jl and PromptingTools.jl to offer highly efficient and relevant query retrieval, ensuring that the responses generated by the system are specific to the content in the created database.
Expand All @@ -12,19 +13,19 @@ It scrapes and parses the URLs and with the help of PromptingTools.jl, creates a

## Installation

To install DocsScraper, use the Julia package manager and the package name:
To install DocsScraper, use the Julia package manager and the package name (it's not registered yet):

```julia
using Pkg
Pkg.add("DocsScraper")
Pkg.add(url="https://github.com/JuliaGenAI/DocsScraper.jl")
```


**Prerequisites:**

- Julia (version 1.10 or later).
- Internet connection for API access.
- OpenAI API keys with available credits. See [How to Obtain API Keys](#how-to-obtain-api-keys).
- OpenAI API keys with available credits. See [How to Obtain API Keys](https://svilupp.github.io/PromptingTools.jl/dev/frequently_asked_questions#Creating-OpenAI-API-Key).


## Building the Index
Expand Down Expand Up @@ -70,14 +71,12 @@ a docsscraper__v20240823__textembedding3large-1024-Bool__v1.0.hdf5
```julia
using AIHelpMe
using AIHelpMe: pprint, load_index!

# Either use the index explicitly
aihelp(index_path, "what is DocsScraper.jl?")

# or set it as the "default" index, then it will be automatically used for every question
AIHelpMe.load_index!(index_path)
# set it as the "default" index, then it will be automatically used for every question
load_index!(index_path)

pprint(aihelp("what is DocsScraper.jl?"))
aihelp("what is DocsScraper.jl?") |> pprint
```
```julia
[ Info: Updated RAG pipeline to `:bronze` (Configuration key: "textembedding3large-1024-Bool").
Expand All @@ -93,8 +92,8 @@ PromptingTools.jl, creates a vector store that can be utilized in RAG (Retrieval
AIHelpMe.jl and PromptingTools.jl to provide efficient and relevant query retrieval, ensuring that the responses generated by the system are specific to the content in the created database.
```
Tip: Use `pprint` for nicer outputs with sources
Tip: Use `pprint` for nicer outputs with sources and `last_result` for more detailed outputs (with sources).
```julia
using AIHelpMe: pprint, last_result
print(last_result)
using AIHelpMe: last_result
print(last_result())
```
1 change: 0 additions & 1 deletion docs/src/working.md

This file was deleted.

16 changes: 11 additions & 5 deletions examples/scripts/using_with_AIHelpMe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,22 @@ Pkg.add(url = "https://github.com/JuliaGenAI/DocsScraper.jl/")
Pkg.add("AIHelpMe")
using DocsScraper
using AIHelpMe
using AIHelpMe: pprint
using AIHelpMe: pprint, last_result

# Creating the index
crawlable_urls = ["https://juliagenai.github.io/DocsScraper.jl/dev/home/"]
index_path = make_knowledge_packs(crawlable_urls;
index_name = "docsscraper", embedding_dimension = 1024, embedding_bool = true,
target_path = joinpath(pwd(), "knowledge_packs"))
target_path = "knowledge_packs")

# Using the index with AIHelpMe
# Using the index with AIHelpMe, load it as the default index
AIHelpMe.load_index!(index_path)

pprint(aihelp("what is DocsScraper.jl?"))
pprint(aihelp("how do I install DocsScraper?"))
# Ask questions // pprint is optional
aihelp("what is DocsScraper.jl?") |> pprint

aihelp("how do I install DocsScraper?") |> pprint

# Get more detailed outputs with sources for the last answer
# Identical to running aihelp(; return_all=true)
last_result() |> pprint

0 comments on commit fe3db67

Please sign in to comment.