Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGUF: ggml backend support for writing tensor data #1033

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

JohannesGaessler
Copy link
Collaborator

This PR adds ggml backend support for writing tensor data to a GGUF file. Currently a workaround is needed where the data is first copied to new tensors with data in RAM, which the GGUF code can then access via memcpy. This PR makes it so that instead a fake tensor is reconstructed from gguf_tensor_info which can then be passed to ggml_backend_tensor_get. I'm not sure whether this is the best solution; a lot of the fileds in gguf_tensor_info are the same as in ggml_tensor, is there a reason why you couldn't just directly store a ggml_tensor as one of the fields in gguf_tensor_info?

@slaren
Copy link
Collaborator

slaren commented Dec 1, 2024

It should be ok to store the tensor in gguf_tensor_info, but I think it would require a refactor to avoid duplicating the data since the gguf loader also uses this struct to load the tensor info.

@JohannesGaessler
Copy link
Collaborator Author

I did a refactor to store a ggml_tensor instead of effectively mirrored fields. It seems to work correctly for MNIST but I think I'll open a PR in the llama.cpp repository to ensure that it works there as well (there are also some slight API changes that I would suggest). While I'm at it I'll also tackle #1038 as well.

Comment on lines +6397 to +6433
/* if (info->n_dims > GGML_MAX_DIMS) { */
/* fprintf(stderr, "%s: invalid number of dimensions (%" PRIu32 ")\n", __func__, info->n_dims); */
/* return false; */
/* } */

/* if (info->type < 0 || info->type >= GGML_TYPE_COUNT) { */
/* fprintf(stderr, "%s: invalid type (%d)\n", __func__, info->type); */
/* return false; */
/* } */

/* if (strlen(info->name.data) >= GGML_MAX_NAME) { */
/* fprintf(stderr, "%s: tensor '%s' name is too long\n", __func__, info->name.data); */
/* return false; */
/* } */

/* for (uint32_t i = 0; i < info->n_dims; ++i) { */
/* if (info->ne[i] <= 0) { */
/* fprintf(stderr, "%s: invalid number of elements (%" PRIu64 ")\n", __func__, info->ne[i]); */
/* return false; */
/* } */
/* } */

/* // prevent overflow for total number of elements */
/* if (INT64_MAX/info->ne[1] <= info->ne[0]) { */
/* fprintf(stderr, "%s: invalid number of elements (%" PRIu64 ")\n", __func__, info->ne[1]); */
/* return false; */
/* } */

/* if (INT64_MAX/info->ne[2] <= info->ne[0]*info->ne[1]) { */
/* fprintf(stderr, "%s: invalid number of elements (%" PRIu64 ")\n", __func__, info->ne[2]); */
/* return false; */
/* } */

/* if (INT64_MAX/info->ne[3] <= info->ne[0]*info->ne[1]*info->ne[2]) { */
/* fprintf(stderr, "%s: invalid number of elements (%" PRIu64 ")\n", __func__, info->ne[3]); */
/* return false; */
/* } */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these checks commented?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just something I did for a WIP version. I have a version with more changes and the checks re-eenabled on my local machine. I'll make a PR to llama.cpp either today or tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants