-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ArrowTypes.jl dependency to serialize optimizers? #77
Comments
ArrowTypes.jl is super light indeed. Also Optimisers.jl is very light, but I don't see any problem in taking this new dependence since it seems useful to people and doing it somewhere else would be piracy. |
Can you sketch what this does, for people who don't know anything about Arrow? At the moment calling |
The purpose is similar to StructTypes.jl if you are familiar with that. It is to define a couple methods to describe precisely how to serialize an object into the arrow format. For example: https://arrow.juliadata.org/dev/manual/#Custom-types. Another way to say it is Arrow has some primitive types My interest in Arrow + Optimisers is for serializing optimisers out in order to e.g. restart training after a crash. Using ArrowTypes to map Julia types to Arrow types allows one to use Arrow.jl to write a complicated Julia object like the nested optimiser state into a vector of bytes, which can be saved somewhere to reload later.
If there's a canonical way to map to a flat vector of primitive types and back, that would serve the purpose as well. As @ToucheSir has pointed out though, being able to serialize the nested structure has the advantage that the result is a bit more standalone and doesn't rely on having the code to reconstruct the nested structure from the flat vector. |
On the topic of StructTypes, is there not some intermediate interface we can implement such that this functionality is not tied to the Arrow format? I'm envisioning an equivalent to Serde in Rust. |
We don’t; StructTypes is designed around JSON afaik and it not expressive enough in that it can’t capture the metadata that makes roundtripping nested objects work smoothly. I don’t know how serde works but I imagine it’s challenging to have a very general intermediary. It would be nice though. Perhaps @quinnj has thought about it. |
Yeah, it's been low on my priority list to try and see if there's some way to evolve StructTypes.jl/ArrowTypes.jl so we don't need both. They overlap quite a bit, so it's unfortunate when you have certain cases that have to overload both to get both JSON and Arrow compat. I'll try to find some time next week to start sketching out a plan for the future. I'll try to take a look at serde as well and see if we can get some inspiration from there as well. |
I found out today that LightBSON.jl uses StructTypes. @ancapdev how has that worked out for you? |
I haven't actually used the |
@quinnj did you happen to do any thinking the last weeks or so? |
One thought it that weak dependencies / package extensions might make adding arrowtypes definitions 0-cost here. That would be 1.9-only or need Requires.jl for pre-1.9 support. |
I still feel there is a need for some plan that will generalize this to working with other serialization formats, but I would be fine with a PR adding a package extension. |
I took a look at it, but with mutable In LegolasFlux we just use But if we add serialization at the level of an individual So I think really what we want is a serialization api that serializes a whole state (possibly by Adding a level of indirection is also nice to prevent changing internals of eg Leaf from breaking deserialization. |
I think both are required because |
What other serialization backends would be useful BTW? LightBSON via StructTypes? Something else also? |
Something BSON-ish, something HDF5-ish, maybe exploring newer DL-focused formats like https://github.com/huggingface/safetensors. |
ArrowTypes is a light package used for defining how to serialize objects to arrow format with Arrow.jl. Arrow is a heavy dependency that actually does the serialization.
We could add a few ArrowTypes definitions in order to serialize optimizers to Arrow. Ref beacon-biosignals/LegolasFlux.jl#17 (comment)
Would be interested to know if PRs for that would be accepted here.
The text was updated successfully, but these errors were encountered: