Skip to content

Commit

Permalink
improve deduping issue (#28)
Browse files Browse the repository at this point in the history
* improve deduping issue

* fix comment

* commit format

* default embeddings

* update
  • Loading branch information
prasmussen15 authored Aug 23, 2024
1 parent 9cc9883 commit a1e5488
Show file tree
Hide file tree
Showing 4 changed files with 201 additions and 188 deletions.
112 changes: 55 additions & 57 deletions core/prompts/dedupe_edges.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,66 +5,64 @@


class Prompt(Protocol):
v1: PromptVersion
v2: PromptVersion
edge_list: PromptVersion
v1: PromptVersion
v2: PromptVersion
edge_list: PromptVersion


class Versions(TypedDict):
v1: PromptFunction
v2: PromptFunction
edge_list: PromptFunction
v1: PromptFunction
v2: PromptFunction
edge_list: PromptFunction


def v1(context: dict[str, Any]) -> list[Message]:
return [
Message(
role='system',
content='You are a helpful assistant that de-duplicates relationship from edge lists.',
),
Message(
role='user',
content=f"""
Given the following context, deduplicate edges from a list of new edges given a list of existing edges:
Existing Edges:
return [
Message(
role='system',
content='You are a helpful assistant that de-duplicates relationship from edge lists.',
),
Message(
role='user',
content=f"""
Given the following context, deduplicate facts from a list of new facts given a list of existing facts:
Existing Facts:
{json.dumps(context['existing_edges'], indent=2)}
New Edges:
New Facts:
{json.dumps(context['extracted_edges'], indent=2)}
Task:
1. start with the list of edges from New Edges
2. If any edge in New Edges is a duplicate of an edge in Existing Edges, replace the new edge with the existing
edge in the list
3. Respond with the resulting list of edges
If any facts in New Facts is a duplicate of a fact in Existing Facts,
do not return it in the list of unique facts.
Guidelines:
1. Use both the name and fact of edges to determine if they are duplicates,
duplicate edges may have different names
1. The facts do not have to be completely identical to be duplicates,
they just need to have similar factual content
Respond with a JSON object in the following format:
{{
"new_edges": [
"unique_facts": [
{{
"fact": "one sentence description of the fact"
"uuid": "unique identifier of the fact"
}}
]
}}
""",
),
]
),
]


def v2(context: dict[str, Any]) -> list[Message]:
return [
Message(
role='system',
content='You are a helpful assistant that de-duplicates relationship from edge lists.',
),
Message(
role='user',
content=f"""
return [
Message(
role='system',
content='You are a helpful assistant that de-duplicates relationship from edge lists.',
),
Message(
role='user',
content=f"""
Given the following context, deduplicate edges from a list of new edges given a list of existing edges:
Existing Edges:
Expand Down Expand Up @@ -94,44 +92,44 @@ def v2(context: dict[str, Any]) -> list[Message]:
]
}}
""",
),
]
),
]


def edge_list(context: dict[str, Any]) -> list[Message]:
return [
Message(
role='system',
content='You are a helpful assistant that de-duplicates edges from edge lists.',
),
Message(
role='user',
content=f"""
Given the following context, find all of the duplicates in a list of edges:
Edges:
return [
Message(
role='system',
content='You are a helpful assistant that de-duplicates edges from edge lists.',
),
Message(
role='user',
content=f"""
Given the following context, find all of the duplicates in a list of facts:
Facts:
{json.dumps(context['edges'], indent=2)}
Task:
If any edge in Edges is a duplicate of another edge, return the fact of only one of the duplicate edges
If any facts in Facts is a duplicate of another fact, return a new fact with one of their uuid's.
Guidelines:
1. Use both the name and fact of edges to determine if they are duplicates,
edges with the same name may not be duplicates
2. The final list should have only unique facts. If 3 edges are all duplicates of each other, only one of their
1. The facts do not have to be completely identical to be duplicates, they just need to have similar content
2. The final list should have only unique facts. If 3 facts are all duplicates of each other, only one of their
facts should be in the response
Respond with a JSON object in the following format:
{{
"unique_edges": [
"unique_facts": [
{{
"fact": "fact of a unique edge",
"uuid": "unique identifier of the fact",
"fact": "fact of a unique edge"
}}
]
}}
""",
),
]
),
]


versions: Versions = {'v1': v1, 'v2': v2, 'edge_list': edge_list}
Loading

0 comments on commit a1e5488

Please sign in to comment.