Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev/manage meta #518

Open
wants to merge 173 commits into
base: main
Choose a base branch
from
Open

Dev/manage meta #518

wants to merge 173 commits into from

Conversation

BeachWang
Copy link
Collaborator

@BeachWang BeachWang commented Dec 24, 2024

Have organized and consolidated the generated metadata uniformly.

  1. Meta data produced by mapper OPs is stored in Fields.meta field. Corresponding keys are list in MetaKeys.
  2. For convenience of split a batch sample, meta data produced by aggregator OPs is stored in Fields.agg field. Corresponding keys are list in AggKeys. Here, we add a default Fields.agg field to all samples in all aggregator operators, without adding a decorator marker TAGGING_OPS like we do with mappers.
  3. Unify the interface for generating meta operators, allowing modification of meta keys. If a meta key already exists, it should simply return without making any changes.
  4. Add corresponding unit test.
  5. Adjust the role playing demo according to the changes.
  6. Remove nested_set function.

@BeachWang BeachWang self-assigned this Dec 24, 2024
@BeachWang BeachWang marked this pull request as draft December 24, 2024 02:24
@BeachWang BeachWang marked this pull request as ready for review December 31, 2024 06:38
@BeachWang BeachWang requested review from HYLcool and yxdyc December 31, 2024 08:37
@BeachWang BeachWang added the enhancement New feature or request label Dec 31, 2024
@yxdyc yxdyc added dj:dataset issues/PRs about the dj-dataset dj:core issues/PRs about the core functions of Data-Juicer labels Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dj:core issues/PRs about the core functions of Data-Juicer dj:dataset issues/PRs about the dj-dataset enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants