diff --git a/README.md b/README.md
index 2fa6f85..02b3ed1 100644
--- a/README.md
+++ b/README.md
@@ -51,7 +51,7 @@ This document outlines the core functions used in the `VirtualHavruta` class. Th
| `retrieve_docs(self, query: str, msg_id: str = '', filter_mode: str = 'primary')` | Retrieves documents matching a query, filtered as primary or secondary sources. | - `query: str`: Query string
- `msg_id: str = ''`: Message ID for logging
- `filter_mode: str = 'primary'`: 'primary' or 'secondary' | List of documents |
| `retrieve_docs_metadata_filtering(self, query: str, msg_id: str = '', metadata_filter: = None)` | Retrieves documents matching a query, filtered based on metadata. | - `query: str`: Query string
- `msg_id: str = ''`: Message ID for logging
- `list: A list of documents that meet the criteria of the specified metadata filter. | List of documents |
| `retrieve_nodes_matching_linker_results(self, linker_results: list[dict], msg_id: str = '', filter_mode: str = 'primary', url_prefix: str = "https://www.sefaria.org/")` | Retrieves nodes corresponding to linker results from the graph database. | - `linker_results: list[dict]`: Results from the linker API
- `msg_id: str = ''`: Message ID for logging
- `filter_mode: str = 'primary'`: 'primary' or 'secondary'
- `url_prefix: str`: URL prefix | List of `Document` objects |
-| `get_retrieval_results_knowledge_graph(self, url: str, direction: str, order: int, score_central_node: float, filter_mode_nodes: str | None = None, msg_id: str = '')` | Retrieves neighbor nodes of a given URL from the knowledge graph. | - `url: str`: Central node URL
- `direction: str`: Edge direction ('incoming', 'outgoing', 'both_ways')
- `order: int`: Number of hops
- `score_central_node: float`: Central node score
- `filter_mode_nodes: str | None = None`: Node filter mode
- `msg_id: str = ''`: Message ID for logging | List of tuples `(Document, score)` |
+| `get_retrieval_results_knowledge_graph(self, url: str, direction: str, order: int, score_central_node: float, filter_mode_nodes: str, msg_id: str = '')` | Retrieves neighbor nodes of a given URL from the knowledge graph. | - `url: str`: Central node URL
- `direction: str`: Edge direction ('incoming', 'outgoing', 'both_ways')
- `order: int`: Number of hops
- `score_central_node: float`: Central node score
- `filter_mode_nodes: str= None`: Node filter mode
- `msg_id: str = ''`: Message ID for logging | List of tuples `(Document, score)` |
| `query_graph_db_by_url(self, urls: list[str])` | Queries the graph database for nodes with given URLs. | - `urls: list[str]`: List of URLs | List of `Document` objects |
| `query_sefaria_linker(self, text_title="", text_body="", with_text=1, debug=0, max_segments=0, msg_id: str = '')` | Queries the Sefaria Linker API and returns the JSON response. | - `text_title: str = ""`: Text title
- `text_body: str = ""`: Text body
- `with_text: int = 1`: Include text in response
- `debug: int = 0`: Debug flag
- `max_segments: int = 0`: Max segments
- `msg_id: str = ''`: Message ID for logging | JSON response (dict or str) |
| `retrieve_docs_linker(self, screen_res: str, enriched_query: str, msg_id: str = '', filter_mode: str = 'primary')` | Retrieves documents from the Sefaria Linker API based on a query. | - `screen_res: str`: Screen result query
- `enriched_query: str`: Enriched query
- `msg_id: str = ''`: Message ID for logging
- `filter_mode: str = 'primary'`: 'primary' or 'secondary' | List of document dictionaries |
@@ -64,7 +64,7 @@ This document outlines the core functions used in the `VirtualHavruta` class. Th
| Function Name | Purpose | Input Parameters | Output |
|---------------|---------|------------------|--------|
| `select_reference(self, query: str, retrieval_res, msg_id: str = '')` | Selects useful references from retrieval results using a language model. | - `query: str`: Query string
- `retrieval_res`: Retrieved documents
- `msg_id: str = ''`: Message ID for logging | Tuple `(selected_retrieval_res: list, tokens_used: int)` |
-| `sort_reference(self, scripture_query: str, enriched_query: str, retrieval_res, filter_mode: str | None = 'primary', msg_id: str = '')` | Sorts retrieval results based on relevance to the query. | - `scripture_query: str`: Scripture query
- `enriched_query: str`: Enriched query
- `retrieval_res`: Retrieval results
- `filter_mode: str | None = 'primary'`: Filter mode
- `msg_id: str = ''`: Message ID for logging | Tuple `(sorted_src_rel_dict: dict, src_data_dict: dict, src_ref_dict: dict, total_tokens: int)` |
+| `sort_reference(self, scripture_query: str, enriched_query: str, retrieval_res, filter_mode: str = 'primary', msg_id: str = '')` | Sorts retrieval results based on relevance to the query. | - `scripture_query: str`: Scripture query
- `enriched_query: str`: Enriched query
- `retrieval_res`: Retrieval results
- `filter_mode: str = 'primary'`: Filter mode
- `msg_id: str = ''`: Message ID for logging | Tuple `(sorted_src_rel_dict: dict, src_data_dict: dict, src_ref_dict: dict, total_tokens: int)` |
| `merge_references_by_url(self, retrieval_res: list[tuple[Document, float]], msg_id: str = '')` | Merges chunks with the same URL to consolidate content and sources. | - `retrieval_res: list[tuple[Document, float]]`: Documents and scores
- `msg_id: str = ''`: Message ID for logging | Tuple `(sorted_src_rel_dict: dict, src_data_dict: dict, src_ref_dict: dict)` |
| `merge_linker_refs(self, retrieved_docs: list, p_sorted_src_rel_dict: dict, p_src_data_dict: dict, p_src_ref_dict: dict, msg_id: str = '')` | Merges new linker references into existing reference dictionaries. | - `retrieved_docs: list`: New documents
- `p_sorted_src_rel_dict: dict`: Existing relevance dict
- `p_src_data_dict: dict`: Existing data dict
- `p_src_ref_dict: dict`: Existing ref dict
- `msg_id: str = ''`: Message ID for logging | Tuple of updated dictionaries |
@@ -75,7 +75,7 @@ This document outlines the core functions used in the `VirtualHavruta` class. Th
| Function Name | Purpose | Input Parameters | Output |
|---------------|---------|------------------|--------|
| `score_document_by_graph_distance(self, n_hops: int, start_score: float, score_decrease_per_hop: float) -> float` | Scores a document based on its distance from the central node in the graph. | - `n_hops: int`: Number of hops
- `start_score: float`: Starting score
- `score_decrease_per_hop: float`: Score decrease per hop | `float` score |
-| `rank_documents(self, chunks: list[Document], enriched_query: str, scripture_query: str | None = None, semantic_similarity_scores: list[float] | None = None, filter_mode: str | None = None, msg_id: str = '')` | Ranks documents based on relevance to the query. | - `chunks: list[Document]`: Documents to rank
- `enriched_query: str`: Enriched query
- `scripture_query: str | None = None`: Scripture query
- `semantic_similarity_scores: list[float] | None = None`: Precomputed scores
- `filter_mode: str | None = None`: Filter mode
- `msg_id: str = ''`: Message ID for logging | Tuple `(sorted_chunks: list[Document], ranking_scores: list[float], total_token_count: int)` |
+| `rank_documents(self, chunks: list[Document], enriched_query: str, scripture_query: str = None, semantic_similarity_scores: list[float]= None, filter_mode: str = None, msg_id: str = '')` | Ranks documents based on relevance to the query. | - `chunks: list[Document]`: Documents to rank
- `enriched_query: str`: Enriched query
- `scripture_query: str = None`: Scripture query
- `semantic_similarity_scores: list[float] = None`: Precomputed scores
- `filter_mode: str = None`: Filter mode
- `msg_id: str = ''`: Message ID for logging | Tuple `(sorted_chunks: list[Document], ranking_scores: list[float], total_token_count: int)` |
| `compute_semantic_similarity_documents_query(self, documents: list[Document], query: str, msg_id: str = '')` | Computes semantic similarity between documents and a query. | - `documents: list[Document]`: Documents
- `query: str`: Query string
- `msg_id: str = ''`: Message ID for logging | `np.array` of similarity scores |
| `get_reference_class(self, documents: list[Document], scripture_query: str, enriched_query: str, msg_id: str = '')` | Determines the reference class for each document based on the query. | - `documents: list[Document]`: Documents
- `scripture_query: str`: Scripture query
- `enriched_query: str`: Enriched query
- `msg_id: str = ''`: Message ID for logging | Tuple `(reference_classes: np.array, total_token_count: int)` |
| `get_page_rank_scores(self, documents: list[Document], msg_id: str = '')` | Retrieves PageRank scores for documents. | - `documents: list[Document]`: Documents
- `msg_id: str = ''`: Message ID for logging | `np.array` of PageRank scores |
@@ -86,8 +86,8 @@ This document outlines the core functions used in the `VirtualHavruta` class. Th
| Function Name | Purpose | Input Parameters | Output |
|---------------|---------|------------------|--------|
-| `get_graph_neighbors_by_url(self, url: str, relationship: str, depth: int, filter_mode_nodes: str | None = None, msg_id: str = '')` | Retrieves neighbor nodes from the graph database based on a URL. | - `url: str`: Central node URL
- `relationship: str`: Edge relationship
- `depth: int`: Neighbor depth
- `filter_mode_nodes: str | None = None`: Node filter mode
- `msg_id: str = ''`: Message ID for logging | List of tuples `(Node, distance)` |
-| `get_chunks_corresponding_to_nodes(self, nodes: list[Document], batch_size: int = 20, max_nodes: int | None = None, unique_url: bool = True, msg_id: str = '')` | Retrieves chunks corresponding to given nodes. | - `nodes: list[Document]`: Nodes
- `batch_size: int = 20`: Batch size
- `max_nodes: int | None = None`: Max nodes
- `unique_url: bool = True`: Ensure unique URLs
- `msg_id: str = ''`: Message ID for logging | List of `Document` objects |
+| `get_graph_neighbors_by_url(self, url: str, relationship: str, depth: int, filter_mode_nodes: str = None, msg_id: str = '')` | Retrieves neighbor nodes from the graph database based on a URL. | - `url: str`: Central node URL
- `relationship: str`: Edge relationship
- `depth: int`: Neighbor depth
- `filter_mode_nodes: str = None`: Node filter mode
- `msg_id: str = ''`: Message ID for logging | List of tuples `(Node, distance)` |
+| `get_chunks_corresponding_to_nodes(self, nodes: list[Document], batch_size: int = 20, max_nodes: int = None, unique_url: bool = True, msg_id: str = '')` | Retrieves chunks corresponding to given nodes. | - `nodes: list[Document]`: Nodes
- `batch_size: int = 20`: Batch size
- `max_nodes: int = None`: Max nodes
- `unique_url: bool = True`: Ensure unique URLs
- `msg_id: str = ''`: Message ID for logging | List of `Document` objects |
| `get_node_corresponding_to_chunk(self, chunk: Document, msg_id: str = '')` | Retrieves the node corresponding to a given chunk. | - `chunk: Document`: Chunk document
- `msg_id: str = ''`: Message ID for logging | `Document` object representing the node |
| `is_primary_document(self, doc: Document) -> bool` | Checks if a document is a primary document. | - `doc: Document`: Document to check | `bool` |
@@ -114,7 +114,7 @@ This document outlines the core functions used in the `VirtualHavruta` class. Th
| Function Name | Purpose | Input Parameters | Output |
|---------------|---------|------------------|--------|
-| `graph_traversal_retriever(self, screen_res: str, scripture_query: str, enriched_query: str, filter_mode_nodes: str | None = None, linker_results: list[dict] | None = None, semantic_search_results: list[tuple[Document, float]] | None = None, msg_id: str = '')` | Retrieves related chunks by traversing the graph starting from seed chunks. | - `screen_res: str`: Screen result query
- `scripture_query: str`: Scripture query
- `enriched_query: str`: Enriched query
- `filter_mode_nodes: str | None = None`: Node filter mode
- `linker_results: list[dict] | None = None`: Linker results
- `semantic_search_results: list[tuple[Document, float]] | None = None`: Semantic search results
- `msg_id: str = ''`: Message ID for logging | Tuple `(retrieval_res_kg: list[tuple[Document, float]], total_token_count: int)` |
+| `graph_traversal_retriever(self, screen_res: str, scripture_query: str, enriched_query: str, filter_mode_nodes: str = None, linker_results: list[dict] = None, semantic_search_results: list[tuple[Document, float]] = None, msg_id: str = '')` | Retrieves related chunks by traversing the graph starting from seed chunks. | - `screen_res: str`: Screen result query
- `scripture_query: str`: Scripture query
- `enriched_query: str`: Enriched query
- `filter_mode_nodes: str = None`: Node filter mode
- `linker_results: list[dict]= None`: Linker results
- `semantic_search_results: list[tuple[Document, float]] = None`: Semantic search results
- `msg_id: str = ''`: Message ID for logging | Tuple `(retrieval_res_kg: list[tuple[Document, float]], total_token_count: int)` |
## Configuration Guide for config.yaml