Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVC can not push large file to OSS anymore once the uploading has been interrupted #10644

Open
CNLHC opened this issue Dec 5, 2024 · 2 comments
Labels
A: data-sync Related to dvc get/fetch/import/pull/push fs: oss Related to the Alibaba Cloud OSS filesystem

Comments

@CNLHC
Copy link

CNLHC commented Dec 5, 2024

Bug Report

Description

Currently dvc can not upload large file to oss, after applying fix described in #10643, dvc still suffer from a subtle bug: if the uploading process is interrupted, then it can not been upload again.

Reproduce

dvc init
mkfile -n 200m test.blob
dvc add test.blob
dvc remote add foo oss://<oss_bucket>
dvc push -r foo #use ctrl+c to interrupt this function call
dvc push  -r foo 

the error message of last push will contain

ERROR: failed to transfer 'xxxx' - 'coroutine' object has no attribute 'parts'

Expected

dvc should push file to remote, either by recovering from the local state or pruning local state and then launch another new uploading session.

Environment information

same env to #10643

with patch applied.

Additional Information (if any):

A quick fix to this issue is deleting the uploading store

rm -rf ~/.py-oss-upload/

This directory is created by the oss2 .

The root cause is this line https://github.com/karajan1001/aiooss2/blob/875a06b99881df6fe900b1fed29e3a91dec12a7f/src/aiooss2/resumable.py#L339

aiooss2 pass an AioBucket to the oss2 and the later one does not invoke await properly.

flowchart LR
   dvc --> dvc-oss --> ossfs --> aiooss2 --> oss2
Loading
@CNLHC CNLHC changed the title DVC can not push large file to OSS if uploading has interrupted DVC can not push large file to OSS anymore once the uploading has been interrupted Dec 5, 2024
@skshetry
Copy link
Member

skshetry commented Dec 5, 2024

Can you share the verbose traceback, please? Add -v to the command above.

@CNLHC
Copy link
Author

CNLHC commented Dec 5, 2024

Can you share the verbose traceback, please? Add -v to the command above.

Sure, here are the detail log.
As I mentioned, the error is caused by the aiooss2, who pass an AsyncIterator to the oss2 which only understand sync code.

2024-12-06 07:53:06,608 DEBUG: v3.58.0 (pip), CPython 3.12.7 on macOS-15.1.1-arm64-arm-64bit
2024-12-06 07:53:06,608 DEBUG: command: /Users/foo/.venv/bin/dvc push -r s9t -v
Collecting                                                                                                                                                                                                  |3.00 [00:00, 1.68kentry/s]
2024-12-06 07:53:06,727 DEBUG: Preparing to transfer data from '/Users/foo/Project/dvc-bug-repro/.dvc/cache/files/md5' to 'oss://s9t-data/foo/files/md5'
2024-12-06 07:53:06,727 DEBUG: Preparing to collect status from 's9t-data/foo/files/md5'
2024-12-06 07:53:06,727 DEBUG: Collecting status from 's9t-data/foo/files/md5'
2024-12-06 07:53:06,902 DEBUG: Estimated remote size: 4096 files
2024-12-06 07:53:06,902 DEBUG: Large remote (3 oids < 40.96 traverse weight), using object_exists for remaining oids
2024-12-06 07:53:06,902 DEBUG: Querying 3 oids via object_exists
2024-12-06 07:53:06,970 DEBUG: Preparing to collect status from '/Users/foo/Project/dvc-bug-repro/.dvc/cache/files/md5'
2024-12-06 07:53:06,970 DEBUG: Collecting status from '/Users/foo/Project/dvc-bug-repro/.dvc/cache/files/md5'
2024-12-06 07:53:06,970 WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:
md5: e4d6540f99f187bab7d5e0f47e5969a9
md5: 3566de3a97906edb98d004d6b947ae9b
2024-12-06 07:53:06,997 ERROR: failed to transfer '26993a1e00ef279b6263536b035b01aa' - 'coroutine' object has no attribute 'parts'
Traceback (most recent call last):
  File "/Users/foo/.venv/lib/python3.12/site-packages/dvc_objects/fs/generic.py", line 349, in transfer
    _try_links(
  File "/Users/foo/.venv/lib/python3.12/site-packages/dvc_objects/fs/generic.py", line 281, in _try_links
    return copy(
           ^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/dvc_objects/fs/generic.py", line 88, in copy
    return _put(
           ^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/dvc_objects/fs/generic.py", line 161, in _put
    _put_one(from_paths[0], to_paths[0])
  File "/Users/foo/.venv/lib/python3.12/site-packages/dvc_objects/fs/generic.py", line 151, in _put_one
    return to_fs.put_file(
           ^^^^^^^^^^^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/dvc_objects/fs/base.py", line 635, in put_file
    self.fs.put_file(os.fspath(from_file), to_info, callback=callback, **kwargs)
  File "/Users/foo/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/Users/foo/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/ossfs/async_oss.py", line 392, in _put_file
    await self._call_oss(
  File "/Users/foo/.venv/lib/python3.12/site-packages/ossfs/async_oss.py", line 161, in _call_oss
    out = await method(service, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/aiooss2/resumable.py", line 140, in resumable_upload
    result = await uploader.upload()
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/aiooss2/resumable.py", line 251, in upload
    await self._load_record()
  File "/Users/foo/.venv/lib/python3.12/site-packages/aiooss2/resumable.py", line 424, in _load_record
    record: Optional[Dict] = self._verify_record(record)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/aiooss2/resumable.py", line 339, in _verify_record
    if record and not self.__upload_exists(record["upload_id"]):
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/oss2/resumable.py", line 727, in __upload_exists
    list(iterators.PartIterator(self.bucket, self.key, upload_id, '0', max_parts=1, headers=valid_headers))
  File "/Users/foo/.venv/lib/python3.12/site-packages/oss2/iterators.py", line 40, in __next__
    self.fetch_with_retry()
  File "/Users/foo/.venv/lib/python3.12/site-packages/oss2/iterators.py", line 48, in fetch_with_retry
    self.is_truncated, self.next_marker = self._fetch()
                                          ^^^^^^^^^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/oss2/iterators.py", line 275, in _fetch
    self.entries = result.parts
                   ^^^^^^^^^^^^
AttributeError: 'coroutine' object has no attribute 'parts'

Pushing
2024-12-06 07:53:07,004 ERROR: failed to push data to the cloud - 1 files failed to upload
Traceback (most recent call last):
  File "/Users/foo/.venv/lib/python3.12/site-packages/dvc/commands/data_sync.py", line 64, in run
    processed_files_count = self.repo.push(
                            ^^^^^^^^^^^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/foo/.venv/lib/python3.12/site-packages/dvc/repo/push.py", line 174, in push
    raise UploadError(failed_count)
dvc.exceptions.UploadError: 1 files failed to upload

@shcheklein shcheklein added fs: oss Related to the Alibaba Cloud OSS filesystem A: data-sync Related to dvc get/fetch/import/pull/push labels Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push fs: oss Related to the Alibaba Cloud OSS filesystem
Projects
None yet
Development

No branches or pull requests

3 participants