-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileDataLoader fails to read the file when size > INT32_MAX #4435
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4435
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled JobAs of commit 0a3116c with merge base dbf7d6e (): CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this fix! One super-minor request but the code changes look great.
I would suggest adding a unit test for this, but I really don't want our test jobs creating and reading a >2GB file. But because there's no automated testing, please add a Test Plan:
section to the PR description explaining how you tested that this works.
@@ -189,7 +191,12 @@ Result<FreeableBuffer> FileDataLoader::load( | |||
size_t needed = size; | |||
uint8_t* buf = reinterpret_cast<uint8_t*>(aligned_buffer); | |||
while (needed > 0) { | |||
ssize_t nread = ::read(fd_, buf, needed); | |||
// read on macos would fail with EINVAL if size > INT32_MAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please begin comments with uppercase letters and end with a period, to be consistent with the rest of the file. Thanks for adding this comment, though, to explain why this extra logic is necessary, and where to focus testing if it changes.
// read on macos would fail with EINVAL if size > INT32_MAX | |
// Reads on macOS will fail with EINVAL if size > INT32_MAX. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you get a chance please update this comment, then we can land the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
That's the reason for not adding test, I didn't want to create a 2GB file for unit testing this. |
@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@shoumikhin merged this pull request in 1e4603d. |
On macOS, the
read
function will fail with anEINVAL
error if the size parameter exceedsINT32_MAX
. This update addresses the issue by adding a check to ensure that the read size does not surpassINT32_MAX
. On Linux, the maximum permissible read size is 2,147,479,552 bytes ( <INT32_MAX
), so attempting to read beyond this limit is inconsequential.Test Plan:
Exporting llama3 with
python -m examples.models.llama2.export_llama --checkpoint examples/models/llama-2-7B/consolidated.00.pth --params examples/models/llama-2-7B/params.json --coreml --disable_dynamic_shape -kv
Without fix
Fails with
invalid argument
error.With fix
Succeeds.