Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(component): enhance merge data with standard operations #5125

Merged

Conversation

raphaelchristi
Copy link
Contributor

Description

This PR refactors the Merge Data component, introducing improved functionality and clarity to the merging process.


Changes

  • Standardized Merge Operations:

    • concatenate: Combines text values with newlines.
    • append: Adds data as new rows.
    • merge: Combines values into lists.
    • join: Adds columns with suffixes.
  • Output Improvements:

    • Change output type to DataFrame for enhanced data handling and manipulation.
  • Documentation:

    • Add detailed documentation for each operation, improving clarity and usability.
  • Error Handling:

    • Enhanced with detailed logging for better debugging and traceability.
  • Refactored Merge Logic:

    • Separate merge logic into dedicated methods for cleaner, modular code.
  • UI Enhancements:

    • Add informative tooltips to aid users in selecting the correct merge operation.

Code Comparison

Before

def merge_data(self) -> list[Data]:
    # Single method handling all merging
    all_keys: set[str] = set()
    for data_input in data_inputs:
        all_keys.update(data_input.data.keys())

After

def merge_data(self) -> DataFrame:
    # Clear operation selection with enum
    operation = MergeOperation(self.operation)
    # Process using dedicated methods
    df = self._process_operation(operation)
    return DataFrame(df)

Benefits

  • Enhanced readability and maintainability.
  • Improved user experience with clear guidance.
  • Robust error handling ensures fewer runtime issues.
  • Modular methods facilitate easier future extensions.

- Add standard merge operations (concatenate, append, merge, join)
- Add operation selection via dropdown
- Return DataFrame output type
- Implement separate merge strategies
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Dec 6, 2024
- Add MIN_INPUTS_REQUIRED constant
- Use descriptive DataFrame variable names
- Move return statement to else block
- Use list comprehension for better performance
- Fix unused loop variable
- Improve overall code formatting
@raphaelchristi raphaelchristi force-pushed the feat/merge-data-standard-operations branch from 30f0388 to aff4dd9 Compare December 6, 2024 18:09
Copy link
Contributor

@ogabrielluiz ogabrielluiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @raphaelchristi

How are you?

langflow.schema.DataFrame works exactly like a pd.DataFrame so you don't have to do anything differently while using it. It just has some helper features and methods.

@raphaelchristi raphaelchristi force-pushed the feat/merge-data-standard-operations branch from b31c5f2 to 344127d Compare December 11, 2024 18:37
Copy link
Contributor

@ogabrielluiz ogabrielluiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use only english in the component.

@erichare erichare removed their request for review December 12, 2024 17:00
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 19, 2024
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 19, 2024
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 19, 2024
- Improved type hinting for combined data structures to enhance code clarity.
- Streamlined the concatenation and merging operations to ensure consistent handling of string and object types.
- Updated the logic to correctly append values to lists when merging data inputs, improving data integrity in the merging process.
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 20, 2024
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Dec 20, 2024
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 20, 2024
- Deleted the DataMergerComponent to streamline the processing components.
- Updated the __init__.py file to reflect the removal of the DataMergerComponent from the exports.
- Introduced a new enum, MergeOperation, to define various data merging strategies: CONCATENATE, APPEND, MERGE, and JOIN.
- Updated the merge_data method to return a DataFrame instead of a list of Data objects, improving data handling.
- Enhanced input validation to ensure a minimum number of data inputs are provided.
- Streamlined the merging logic to support different operations, improving flexibility and usability of the component.
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Dec 20, 2024
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 20, 2024
- Moved MIN_INPUTS_REQUIRED constant outside the class for better visibility and consistency.
- Updated the merge_data method to reference the new constant instead of the class attribute.
- Improved error logging message for clarity.
@ogabrielluiz ogabrielluiz added this pull request to the merge queue Dec 20, 2024
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 20, 2024
Merged via the queue into langflow-ai:main with commit 68c36c4 Dec 20, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants