doc changes

JuliaGenAI · Aug 18, 2024 · d187d54 · d187d54
1 parent 708a02f
commit d187d54
Show file tree

Hide file tree

Showing 3 changed files with 74 additions and 3 deletions.
diff --git a/docs/make.jl b/docs/make.jl
@@ -13,9 +13,8 @@ makedocs(;
         canonical = "https://splendidbug.github.io/DocsScraper.jl",
         edit_link = "main",
         assets = String[]),
-    pages = [
-        "API Index" => "index.md"
-    ]
+    pages = ["Home" => "home.md",
+        "API Reference" => "index.md"]
 )
 
 deploydocs(;

diff --git a/docs/src/home.md b/docs/src/home.md
@@ -0,0 +1,71 @@
+
+## DocsScraper: "A document scraping and parsing tool used to create a custom RAG database for AIHelpMe.jl"
+[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://splendidbug.github.io/DocsScraper.jl/dev/) [![Build Status](https://github.com/splendidbug/DocsScraper.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/splendidbug/DocsScraper.jl/actions/workflows/CI.yml?query=branch%3Amain) [![Aqua](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)
+
+
+DocsScraper is a package designed to create a vector database from input URLs. It scrapes and parses the URLs and with the help of PromptingTools.jl, creates a vector store that can be used in a RAG applications. It integrates with AIHelpMe.jl and PromptingTools.jl to offer highly efficient and relevant query retrieval, ensuring that the responses generated by the system are specific to the content in the created database.
+
+## Features
+
+- **URL Scraping and Parsing**: Automatically scrapes and parses input URLs to extract relevant information, paying particular attention to code snippets and code blocks. Gives an option to customize the chunk sizes
+- **URL Crawling**: Optionally crawls the input URLs to look for multiple pages in the same domain.
+- **Vector Database Creation**: Leverages PromptingTools.jl to create embeddings with customizable embedding model, size and type (Bool and Float32). 
+
+## Installation
+
+To install DocsScraper, use the Julia package manager and the package name:
+
+```julia
+using Pkg
+Pkg.add("DocsScraper")
+```
+
+
+**Prerequisites:**
+
+- Julia (version 1.10 or later).
+- Internet connection for API access.
+- OpenAI API keys with available credits. See [How to Obtain API Keys](#how-to-obtain-api-keys).
+
+
+## Usage
+```julia
+index = make_knowledge_packs(; single_urls=["https://docs.sciml.ai/Overview/stable/"], index_name="sciml", embedding_size=1024)
+```
+```
+[ Info: robots.txt unavailable for https://docs.sciml.ai:/Overview/stable/
+[ Info: Processing https://docs.sciml.ai/Overview/stable/...
+. . .
+[ Info: Parsing URL: https://docs.sciml.ai/Overview/stable/
+[ Info: Scraping done: 69 chunks
+[ Info: Removed 0 short chunks
+[ Info: Removed 0 duplicate chunks
+[ Info: Created embeddings for sciml. Cost: $0.001
+a sciml__v20240817__textembedding3large-1024-Bool__v1.0.hdf5
+[ Info: ARTIFACT: sciml__v20240817__textembedding3large-1024-Bool__v1.0.tar.gz
+┌ Info: sha256: 
+└   bytes2hex(open(sha256, fn_output)) = "58bec6dd9877d1b926c96fceb6aacfe5ef6395e57174d9043ccf18560d7b49bb"
+┌ Info: git-tree-sha1: 
+└   Tar.tree_hash(IOBuffer(inflate_gzip(fn_output))) = "031c3f51fd283e89f294b3ce9255561cc866b71a"```
+```
+`make_knowledge_packs` is the entry point to the package. This function takes in the URLs to parse and returns the index. This index can be passed to AIHelpMe.jl to answer queries on the built knowledge packs.
+
+**Using the created index:**
+```julia
+using AIHelpMe
+sciml_index = AIHelpMe.load_index!(index)
+aihelp(sciml_index, "what is Sciml")
+```
+```
+[ Info: Updated RAG pipeline to `:bronze` (Configuration key: "textembedding3large-1024-Bool").
+[ Info: Loaded index from packs: julia into MAIN_INDEX
+[ Info: Loading index from sciml__v20240817__textembedding3large-1024-Bool__v1.0.hdf5
+[ Info: Loaded index a file sciml__v20240817__textembedding3large-1024-Bool__v1.0.hdf5 into MAIN_INDEX
+[ Info: Done with RAG. Total cost: $0.01
+--------------------
+AI Message
+--------------------
+SciML, or Scientific Machine Learning, is an ecosystem developed in the Julia programming language, aimed at solving equations and modeling systems while integrating the capabilities of      
+scientific computing and machine learning. It provides a range of tools with unified APIs, enabling features like differentiability, sensitivity analysis, high performance, and parallel      
+implementations. The SciML organization supports these tools and promotes their coherent use for various scientific applications.
+```
diff --git a/docs/src/working.md b/docs/src/working.md
@@ -0,0 +1 @@
+## Parser