You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。
Description 描述
Currently, the dj_ckpt_manager and executor only support the HF dataset. They essentially performs three actions:
Tracks and saves the executed operation list from OP_1 to OP_i.
Saves the processed dataset ( D_{op_i} ).
Checks and loads ( D_{op_i} ) when the feature is enabled during re-processing.
It would be straightforward to extend this feature into ray_executor. For step 2 and 3, we can implement a few new interfaces for snapshotting Ray Data states and using persistent storage.
Use case 使用场景
No response
Additional 额外信息
No response
Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR?
Yes I'd like to help by submitting a PR! 是的!我愿意提供帮助并提交一个PR!
The text was updated successfully, but these errors were encountered:
@yxdyc I'm a newbie for ray, I cannot understand ray's local data writing, I means the local://,
it will write to the disk of host server? client node or work node, thanks
Search before continuing 先搜索,再继续
Description 描述
Currently, the dj_ckpt_manager and executor only support the HF dataset. They essentially performs three actions:
It would be straightforward to extend this feature into ray_executor. For step 2 and 3, we can implement a few new interfaces for snapshotting Ray Data states and using persistent storage.
Use case 使用场景
No response
Additional 额外信息
No response
Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR?
The text was updated successfully, but these errors were encountered: