-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inference-directed codegen via CodeInstance edges and typeinf_ext_toplevel APIs #56880
base: master
Are you sure you want to change the base?
Conversation
Sometimes an edge (especially from precompile file, but sometimes from inference) will specify a CodeInstance that does not need to be compiled for its ABI and simply needs to be cloned to point to the existing copy of it.
…tandard helper function
This was failing the h_world_age test sometimes.
This avoids unnecessary compression when running (not generating code). While generating code, we continue the legacy behavior of storing compressed code, since restarting from a ji without that is quite slow. Eventually, we should also remove that code also once we have generated the object file from it. This replaces the defective SOURCE_MODE_FORCE_SOURCE option with a new `typeinf_ext_toplevel` batch-mode interface for compilation which returns all required source code. Only two options remain now: SOURCE_MODE_NOT_REQUIRED : Require only that the IPO information (e.g. rettype and friends) is present. SOURCE_MODE_FORCE_ABI : Require that the IPO information is present (for ABI computation) and that the returned CodeInstance can be invoked on the host target (preferably after inference, called directly, but perfectly acceptable for Base.Compiler to instead force the runtime to use a stub there or call into it with the interpreter instead by having failed to provide any code). This replaces the awkward `jl_create_native` interface (which is now just a shim for calling the new batch-mode `typeinf_ext_toplevel`) with a simpler `jl_emit_native` API, which does not do any inference or other callbacks, but simply is a batch-mode call to `jl_emit_codeinfo` and the work to build the external wrapper around them for linkage.
ct->world_age = jl_typeinf_world; | ||
codeinfos = (jl_array_t*)jl_apply(fargs, 4); | ||
ct->world_age = last_age; | ||
#define jl_is_array_any(v) jl_typetagis(v,jl_array_any_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to header?
|
||
// also be used be extern consumers like GPUCompiler.jl to obtain a module containing | ||
// all reachable & inferrrable functions. | ||
void *jl_emit_native_impl(jl_array_t *codeinfos, LLVMOrcThreadSafeModuleRef llvmmod, const jl_cgparams_t *cgparams, int _external_linkage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mark this DLLEXPORT?
I think from a GPUCompiler pespective we would also be okay with an interface that just takes a single codeinstance + codeinfo
This is building upon the many efforts around using CodeInstance everywhere (especially as the return value from jl_type_infer and the input format to edges) by moving a lot of hard-coded algorithms that were previously in C (such as
recursive_compile_graph
andjl_ci_cache_lookup
), and which were therefore previously also slightly broken (especially with concurrent environments), into Julia's Compiler.jl code, where we can most likely maintain them much better going forward. See descriptions in the individual commits for some of the specifics of the changes and fixes, and how to change existing code to use these API correctly. In followup stages, most code relevant to precompile_utils, trim, and even the allocation-checker should consider being moved into Julia also now, since being written in C/C++ is currently providing negative value for maintaining those, and the change in the API boundary should now make that additional conversion easier.Gives a considerably smaller system image, despite having more code, by being better algorithms which avoid allocating permanent garbage: 155 MB -> 147MB in
.text
Makes a slightly larger Pkg.ji file cache, hopefully mainly due to being more strategic about what code is compiled (because that logic is mostly now in Julia instead of C), as it appear to have both inferred and compiled about 10% more code according to high level analysis of it:
$ du -sh usr/share/julia/compiled/v1.12/
237M # on PR
222M # on master