Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IFRT #164

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft

Support IFRT #164

wants to merge 20 commits into from

Conversation

mofeing
Copy link
Collaborator

@mofeing mofeing commented Oct 4, 2024

This time for real.

  •  IFRT
    • Value
    • Tuple
      • Unpack
    • Memory
    • Device
      • Memories
    • Sharding
    • Array
    • Topology
    • Client
    • HostCallback
    • LoadedHostCallback
    • Executable
    • LoadedExecutable
      • fix type conversion on the following functions
        • ifrt_loadedexecutable_parameter_shardings
        • ifrt_loadedexecutable_output_shardings
        • ifrt_loadedexecutable_parameter_layouts
        • ifrt_loadedexecutable_output_layouts
      • implement in C-API
        •  GetOutputMemoryKinds
        • GetCostAnalysis
        • Execute
    • Compiler
      • check that DeserializeExecutableOptions and CompileOptions are really deprecated
    • DType
    • Shape
    • DynamicShape
    • Index
    • IndexDomain
    • MemoryKind
  • IFRT-PjRt backend
    • PjRtArray
    • PjRtClient
    • PjRtCompiler
    • PjRtDevice
    • PjRtExecutable
    • PjRtHostSendAndRecvLoadedHostCallback
    • PjRtLoadedExecutable
    • PjRtMemory
    • PjRtTopology
    • PjRtTuple

@wsmoses i have some doubts and problems that probably i would need a helping from XLA people (also, my C++ is a bit rusty). my main problem is with C++ semantics on ownership, copying and moving: a lot of XLA API doesn't return pointer objects but just regular objects, which i need to move to the heap to have a pointer that i can send to Julia. i can't just take the address from the returned object from XLA because it will on the stack and freed when returning from the C-function. unfortunately, most of them don't have copy or move constructors so i can't just do new Object(myObject) or new Object(std::move(myObject)).

other issues:

  1. regarding ref-counted objects:
    • when the XLA API needs a std::shared_ptr or a tsl::RCReference, i just create it with the pointer passed from Julia. i guess this is making Julia loss the ownership of the pointed object, so should we track the ownership or should we remove the Julia object?
    • when it returns a std::shared_ptr, it is save to just return the pointer by calling .get()? or do we must copy the pointed object to another allocation?
    • with tsl::RCReference i'm just calling release for returning the pointer which i guess is ok?
  2. how can we convert xla::PjRtFuture<> to a pointer so we can pass it through the C-API? or can we wrap it around a opaque block?
  3. if we just use the base classes (e.g. Client, Device, ...), i fear that destructors won't be called when objects are GCed in Julia. destructors on polymorphic objects aren't called unless the destructor of the base class is virtual right?

This comment was marked as outdated.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactant.jl Benchmarks

Benchmark suite Current: 7a96406 Previous: deefd18 Ratio
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1301347763 ns 1354313354 ns 0.96
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux 217103056 ns 208305734 ns 1.04
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant 5365149260 ns 5150619393 ns 1.04
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux 22423672030 ns 19034043181 ns 1.18
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1257814655 ns 1337880973 ns 0.94
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux 8327637 ns 9140912.5 ns 0.91
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant 1634348478 ns 1693455123 ns 0.97
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux 2178014879 ns 2263974104 ns 0.96
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1257786527 ns 1332284786.5 ns 0.94
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux 90765896 ns 86734383.5 ns 1.05
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant 2171489336 ns 2287781106 ns 0.95
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux 4967371369 ns 6118937863 ns 0.81
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1329525357.5 ns 1274319613 ns 1.04
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux 7794196 ns 7577830 ns 1.03
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant 1474849210 ns 1512493678 ns 0.98
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux 1724297638 ns 1456174515 ns 1.18
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1306781721.5 ns 1263335461 ns 1.03
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux 11562433 ns 11439682 ns 1.01
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant 1768896684 ns 1805020552 ns 0.98
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux 2646587434.5 ns 2506386037 ns 1.06
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1309249187 ns 1307096998 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux 88858488 ns 87573137 ns 1.01
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant 2229256097 ns 2278669956 ns 0.98
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux 3817263196 ns 3700585497 ns 1.03
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1307760748 ns 1310298018 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux 118030839 ns 109805738 ns 1.07
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant 3029932924 ns 3158445388 ns 0.96
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux 9978512015 ns 14341152106 ns 0.70
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1356889531 ns 1384173668 ns 0.98
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux 122479512.5 ns 125016343 ns 0.98
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant 3198347884 ns 3254191833 ns 0.98
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux 7412499032 ns 6513914428 ns 1.14
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1335763412 ns 1339992292 ns 1.00
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux 88481951 ns 81713817 ns 1.08
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant 1893395827 ns 1963454757 ns 0.96
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux 2734552651 ns 2503177627 ns 1.09

This comment was automatically generated by workflow using github-action-benchmark.

@mofeing mofeing changed the title Write Bazel rule for IFRT dialect Support IFRT Dec 30, 2024
@mofeing
Copy link
Collaborator Author

mofeing commented Jan 2, 2025

i added some utilities for Julia-like conversion in C++ and better communication between Julia and C.

  • Type<T> is just a placeholder for representing a target type
  • convert converts the passing object to the object we can pass through FFI (it helps a lot deduplicating the code and avoiding bugs)
  • span<T> is a small std::span-like object but for C++ <20 and ABI-stable so we can view it from Julia as a Tuple{Csize, Ptr{Cvoid}}

for example, imagine a function that returns a absl::Span<tsl::RCReference<Device>>. then our C binding would be like:

extern "C" span<Device*> the_function(...) {
    return convert(Type<span<Device*>>(), some_func_that_returns_a_absl_Span(...))
}

conceptually works, in practice a need to add some particular implementations so it works (i'm sure it can be done more generically but i don't have the time nor the energy to learn the terribly complicated C++)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant