-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure makes a copy #146
Comments
I think the story here is that Restructure makes a new copy of the model, which allocates, and this costs about 30μs: julia> @btime $re($params); # This is the reconstruction cost
min 11.125 μs, mean 29.891 μs (24 allocations, 257.95 KiB)
julia> @btime copy($params); # ... and it's mostly allocation, same mean:
min 6.007 μs, mean 29.209 μs (2 allocations, 257.05 KiB) Normally the allocation of the forward pass dwarfs this, but if you don't make batches and do this for every sample, then the model allocation dominates. julia> input = rand(Float32, layer_size); # As above, batch size of 1
julia> @btime $layer($input); # ... hence alloc much smaller here, unusual.
min 6.800 μs, mean 8.970 μs (2 allocations, 2.12 KiB)
julia> @btime $re($params)($input); # Complete, as above
min 20.458 μs, mean 51.587 μs (26 allocations, 260.08 KiB) This isn't a regime anyone got around to optimising. It would probably be easy to make Restructure either (1) copy into the existing model, or (2) create a new model out of views not copies. Note that ComponentArrays is doing nothing here, Note also that allocations in the gradient will typically be larger than in the forward pass. When last I checked, destructure does this in a more efficient way than ComponentArrays, which made a copy of the full parameter vector repeatedly. This will matter (for larger models) in exactly the regime you have selected. |
Thanks for your thorough reply! Good to know that ComponentArrays isn't doing anything. The offending line seems to be this one: Optimisers.jl/src/destructure.jl Line 103 in 8a37946
I wonder if just using |
Changing that line to use a view would probably work fine. ProjectTo won't mind, but I'm not sure whether all operations downstream will be happy with a reshaped view (e.g. will CuArray dispatch work?) Maybe the bigger question is what the interface should be, e.g. whether this should be a keyword option. Or whether it should be a separate |
I think we should have a copyless alternative to restructure, possible name |
In some situations, you have to restructure a lot if you use Flux, for instance if you want to run your batches as seperate solves in DiffEqFlux using an EnsembleProblem. You have to use something like a ComponentArray to pass the parameters through the solver and to let the adjoint methods do their work in differentiating the solve. But restructuring using a ComponentArray is unreasonably(?) slow in Flux. Switching to Lux eliminates those problems, but it seems like something that could be implemented better in Flux or ComponentArrays.
Example code:
Here, we spend 10% of the time in sgemv matrix multiplication, another 10% in the rest of the Dense call and about 75% in the Restructure. This gets worse if the networks are smaller. As far as I can read the flame graph, the restructure seems to spend a lot of time in the GC:
Could there be a way to mitigate this specific problem? In particular if you use the same parameters. I think this would make some example code a lot faster too.
I also opened a discourse about this because I'm not sure if it's an issue with Flux specifically: https://discourse.julialang.org/t/flux-restructure-for-componentarrays-jl-unreasonably-slow/97849/3
The text was updated successfully, but these errors were encountered: