-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support IFRT #164
base: main
Are you sure you want to change the base?
Support IFRT #164
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reactant.jl Benchmarks
Benchmark suite | Current: 7a96406 | Previous: deefd18 | Ratio |
---|---|---|---|
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1301347763 ns |
1354313354 ns |
0.96 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux |
217103056 ns |
208305734 ns |
1.04 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant |
5365149260 ns |
5150619393 ns |
1.04 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux |
22423672030 ns |
19034043181 ns |
1.18 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1257814655 ns |
1337880973 ns |
0.94 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux |
8327637 ns |
9140912.5 ns |
0.91 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1634348478 ns |
1693455123 ns |
0.97 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux |
2178014879 ns |
2263974104 ns |
0.96 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1257786527 ns |
1332284786.5 ns |
0.94 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux |
90765896 ns |
86734383.5 ns |
1.05 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant |
2171489336 ns |
2287781106 ns |
0.95 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux |
4967371369 ns |
6118937863 ns |
0.81 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1329525357.5 ns |
1274319613 ns |
1.04 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux |
7794196 ns |
7577830 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1474849210 ns |
1512493678 ns |
0.98 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux |
1724297638 ns |
1456174515 ns |
1.18 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1306781721.5 ns |
1263335461 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux |
11562433 ns |
11439682 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant |
1768896684 ns |
1805020552 ns |
0.98 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux |
2646587434.5 ns |
2506386037 ns |
1.06 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1309249187 ns |
1307096998 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux |
88858488 ns |
87573137 ns |
1.01 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant |
2229256097 ns |
2278669956 ns |
0.98 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux |
3817263196 ns |
3700585497 ns |
1.03 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1307760748 ns |
1310298018 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux |
118030839 ns |
109805738 ns |
1.07 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant |
3029932924 ns |
3158445388 ns |
0.96 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux |
9978512015 ns |
14341152106 ns |
0.70 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1356889531 ns |
1384173668 ns |
0.98 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux |
122479512.5 ns |
125016343 ns |
0.98 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant |
3198347884 ns |
3254191833 ns |
0.98 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux |
7412499032 ns |
6513914428 ns |
1.14 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1335763412 ns |
1339992292 ns |
1.00 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux |
88481951 ns |
81713817 ns |
1.08 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1893395827 ns |
1963454757 ns |
0.96 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux |
2734552651 ns |
2503177627 ns |
1.09 |
This comment was automatically generated by workflow using github-action-benchmark.
speeds up compile time
i added some utilities for Julia-like conversion in C++ and better communication between Julia and C.
for example, imagine a function that returns a extern "C" span<Device*> the_function(...) {
return convert(Type<span<Device*>>(), some_func_that_returns_a_absl_Span(...))
} conceptually works, in practice a need to add some particular implementations so it works (i'm sure it can be done more generically but i don't have the time nor the energy to learn the terribly complicated C++) |
This time for real.
ifrt_loadedexecutable_parameter_shardings
ifrt_loadedexecutable_output_shardings
ifrt_loadedexecutable_parameter_layouts
ifrt_loadedexecutable_output_layouts
GetOutputMemoryKinds
GetCostAnalysis
Execute
DeserializeExecutableOptions
andCompileOptions
are really deprecated@wsmoses i have some doubts and problems that probably i would need a helping from XLA people (also, my C++ is a bit rusty). my main problem is with C++ semantics on ownership, copying and moving: a lot of XLA API doesn't return pointer objects but just regular objects, which i need to move to the heap to have a pointer that i can send to Julia. i can't just take the address from the returned object from XLA because it will on the stack and freed when returning from the C-function. unfortunately, most of them don't have copy or move constructors so i can't just do
new Object(myObject)
ornew Object(std::move(myObject))
.other issues:
std::shared_ptr
or atsl::RCReference
, i just create it with the pointer passed from Julia. i guess this is making Julia loss the ownership of the pointed object, so should we track the ownership or should we remove the Julia object?std::shared_ptr
, it is save to just return the pointer by calling.get()
? or do we must copy the pointed object to another allocation?tsl::RCReference
i'm just callingrelease
for returning the pointer which i guess is ok?xla::PjRtFuture<>
to a pointer so we can pass it through the C-API? or can we wrap it around a opaque block?Client
,Device
, ...), i fear that destructors won't be called when objects are GCed in Julia. destructors on polymorphic objects aren't called unless the destructor of the base class is virtual right?