Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpointer support for Ray-Mode #487

Open
2 tasks done
yxdyc opened this issue Nov 12, 2024 · 1 comment
Open
2 tasks done

Checkpointer support for Ray-Mode #487

yxdyc opened this issue Nov 12, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@yxdyc
Copy link
Collaborator

yxdyc commented Nov 12, 2024

Search before continuing 先搜索,再继续

  • I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。

Description 描述

Currently, the dj_ckpt_manager and executor only support the HF dataset. They essentially performs three actions:

  1. Tracks and saves the executed operation list from OP_1 to OP_i.
  2. Saves the processed dataset ( D_{op_i} ).
  3. Checks and loads ( D_{op_i} ) when the feature is enabled during re-processing.

It would be straightforward to extend this feature into ray_executor. For step 2 and 3, we can implement a few new interfaces for snapshotting Ray Data states and using persistent storage.

Use case 使用场景

No response

Additional 额外信息

No response

Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR?

  • Yes I'd like to help by submitting a PR! 是的!我愿意提供帮助并提交一个PR!
@yxdyc yxdyc added the enhancement New feature or request label Nov 12, 2024
@yxdyc yxdyc added this to the Distributed processing milestone Nov 12, 2024
@vincent-pli
Copy link

vincent-pli commented Nov 14, 2024

@yxdyc I'm a newbie for ray, I cannot understand ray's local data writing, I means the local://,
it will write to the disk of host server? client node or work node, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants