Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Clean input before passing it to the llm #238

Conversation

paul-paliychuk
Copy link
Collaborator

@paul-paliychuk paul-paliychuk commented Dec 11, 2024

Important

Adds _clean_input method to sanitize input strings in LLMClient and OpenAIClient, with tests for validation.

  • Behavior:
    • Adds _clean_input method in client.py to remove invalid Unicode and control characters from input strings.
    • Integrates _clean_input into generate_response in client.py and openai_client.py to sanitize message content before processing.
  • Testing:
    • Adds test_client.py to test _clean_input with various cases, including control characters, zero-width characters, and invalid Unicode.
  • Misc:
    • Updates version in pyproject.toml to 0.5.0pre5.

This description was created by Ellipsis for e852493. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 96e6377 in 8 seconds

More details
  • Looked at 24 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_f5hQKesiGknn0f9n


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to f40ca8c in 2 minutes and 33 seconds

More details
  • Looked at 104 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. graphiti_core/llm_client/openai_client.py:91
  • Draft comment:
    The _clean_input method is called twice for each message, once in generate_response and once in _generate_response. This is redundant. Consider removing one of the calls to optimize performance.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The _clean_input method is being called twice for each message, once in generate_response and once in _generate_response. This is redundant and can be optimized.
2. tests/llm_client/test_client.py:34
  • Draft comment:
    Remove unnecessary comment as it repeats the information already present in the code.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The test cases in the test_clean_input function are well-structured and cover various scenarios. However, the comment on line 34 is not necessary as it repeats the information already present in the code.

Workflow ID: wflow_5X35jGYEohyhiscC


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

graphiti_core/llm_client/client.py Show resolved Hide resolved
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on d8fa84e in 14 seconds

More details
  • Looked at 41 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. graphiti_core/llm_client/client.py:59
  • Draft comment:
    The retry logic was moved from above _clean_input to _generate_response_with_retry. Ensure this change is intentional and does not affect other parts of the code.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The _clean_input method is correctly implemented and integrated into the generate_response method. However, the retry logic was moved, which is not a problem but should be noted.
2. graphiti_core/llm_client/client.py:60
  • Draft comment:
    The variable name input shadows the built-in Python function input(). Consider renaming it to something more descriptive, like input_text.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The code is mostly well-structured, but there are some issues with variable naming and DRY principle.

Workflow ID: wflow_rOqbjW6sty1EarQV


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Member

@danielchalef danielchalef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on e852493 in 10 seconds

More details
  • Looked at 12 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. pyproject.toml:3
  • Draft comment:
    The version update to "0.5.0pre5" is appropriate for the new feature addition. Ensure that this version is consistently used across all relevant documentation and deployment scripts.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The version update in the pyproject.toml file is consistent with the changes made in the PR, which is to add a new feature for input cleaning. This is a minor version update, which is appropriate for adding new functionality in a backward-compatible manner.
2. pyproject.toml:3
  • Draft comment:
    Ensure that the versioning follows a consistent pattern across all files and documentation.
  • Reason this comment was not posted:
    Confidence changes required: 33%
    The pyproject.toml file looks well-structured and doesn't contain any obvious issues related to the rules provided. No secrets or sensitive data are present, and the file is formatted correctly.

Workflow ID: wflow_3GJlWPkSsoDm91bR


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@paul-paliychuk paul-paliychuk merged commit a9091b0 into main Dec 11, 2024
7 checks passed
@paul-paliychuk paul-paliychuk deleted the paul/zepai-847-clean-llm-inputs-to-avoid-invalid-unicode-or-control branch December 11, 2024 02:27
@github-actions github-actions bot locked and limited conversation to collaborators Dec 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants