-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add special build for testing serialization via a serialization roundtrip in JIT compilation and fix serialization leaks #7763
Add special build for testing serialization via a serialization roundtrip in JIT compilation and fix serialization leaks #7763
Conversation
src/Pipeline.cpp
Outdated
@@ -582,6 +582,25 @@ void Pipeline::compile_jit(const Target &target_arg) { | |||
// Clear all cached info in case there is an error. | |||
contents->invalidate_cache(); | |||
|
|||
#ifdef WITH_SERIALZATION_JIT | |||
// TODO: replace file serialization with in-memory serialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please open a Issue for this task and add the comment here e.g. TODO(http://etc):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derek already implement the API in his PR so I will refer to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor nits
Updated |
src/CMakeLists.txt
Outdated
@@ -508,6 +508,15 @@ if (WITH_SERIALIZATION) | |||
target_compile_definitions(Halide PRIVATE WITH_SERIALIZATION) | |||
endif () | |||
|
|||
# Enable serialization testing by intercepting JIT compilation with a serialization roundtrip; | |||
# This is used only for special builds made specifically for testing, and must be disabled by default. | |||
option(WITH_SERIALIZATION_JIT "Intercepting JIT compilation with a serialization roundtrip, for test only" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: rename this to something more evocative of testing, e.g. WITH_SERIALIZATION_JIT_ROUNDTRIP_TESTING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I took the liberty of making this change myself, so that I could test the necessary buildbot changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sry, I should have done it last night, thanks!
This comment was marked as outdated.
This comment was marked as outdated.
(I'll see if I can get a local build with symbols enabled to get any better info on this) |
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to understand how/why these leaks are occurring before this can land -- are they intrinsic to the serializer, or a byproduct of the testing hack?
thanks for reporting this, this directly confirms my previous hypothesis that serialization code contains leaks, we found it because our fuzzer driver will oom after running 30k ish pipelines(each run has a deserialization call). i'm pretty sure it's not because of the jit hack, it's part of the serialization. Last time i had no luck using valgrind, tmr i'll try figure it out |
It’s probably easier/more accurate to use the asan build configuration for Halide. Val grind is notoriously slow since it’s a runtime intercept. If you look at the symbolized call sites that Steven posted, the allocation locations should lead you to areas to investigate. It appears to be IR and Parameter references that aren’t getting cleaned up or have ownership cycles. |
One place to check is any new constructor-like things you have added, e.g. Function::update_with_deserialization The normal ways Functions acquire right-hand-sides have various checks in place to avoid reference cycles, and that may not be true of the new ways you added. |
Can you give an example of the existing checks? |
okay, it's just like Andrew has guessed, |
Maybe pushing the updates here in this PR is easier for testing? |
OK, fix applied, wait for it to go green (again) |
Still one failure:
|
Locally this test passed, I have a theory that we are deserializing from files and several tests are running together with some race issue on the same file names (i.e. |
To verify my theory, the new commit now uses serialize to memory instead of files, passed all tests on CPU locally, let's see how it goes on the buildbots. |
These failures shouldn't be related to this PR..... |
Looks like the serialization tutorial is throwing an error |
I see the same error in the tutorial PR, then what about the fast_pow one https://buildbot.halide-lang.org/master/#/builders/82/builds/75 |
I'm confused ... I don't see a failure for the serialization tutorial in the PR I made: #7760 Oh ... wait ... nvm. Sorry ... it appears to fail due to WITH_SERIALIZATION not being set on OSX? |
@TH3CHARLie Sorry for not catching this .... adding this to the Makefile should resolve the tutorial issue:
|
@derek-gerstmann tutorial make failure still exsits, the x86 failure seems unrelated though |
That would indicate that
|
Also, it looks like the makefile checks for |
OK now all failures look unrelated (failing on host targets that are not serialization test), PTAL @steven-johnson @derek-gerstmann @abadams |
Makefile
Outdated
@@ -288,6 +288,9 @@ LLVM_SHARED_LIBS = -Wl,-rpath=$(LLVM_LIBDIR) -L $(LLVM_LIBDIR) -lLLVM | |||
LLVM_LIBS_FOR_SHARED_LIBHALIDE=$(if $(WITH_LLVM_INSIDE_SHARED_LIBHALIDE),$(LLVM_STATIC_LIBS),$(LLVM_SHARED_LIBS)) | |||
|
|||
TUTORIAL_CXX_FLAGS ?= -std=c++17 -g -fno-omit-frame-pointer $(RTTI_CXX_FLAGS) -I $(ROOT_DIR)/tools $(SANITIZER_FLAGS) $(LLVM_CXX_FLAGS_LIBCPP) | |||
ifneq (,$(shell which flatc)) | |||
TUTORIAL_CXX_FLAGS += -DWITH_SERIALIZATION -I $(BUILD_DIR) -I $(shell which flatc | sed 's/bin.flatc/include/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is necessary? These flags are the ones that would be set by users of Halide. It would be bad if they needed to set any of this to use serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just remove this part, we already skipping tutorial 23 if no flatc found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
Most failures are related to webGPU, |
…trip in JIT compilation and fix serialization leaks (halide#7763) * add back JIT testing, enclosed in #ifdef blocks * fix typo * nits * WITH_SERIALIZATION_JIT->WITH_SERIALIZATION_JIT_ROUNDTRIP_TESTING * fix self-reference leaks: now uses weak function ptr in reverse function mappings * Move clang-tidy checks back to Linux Recent changes in the GHA runners for macOS don't play well with clang-tidy; rather than sink any more time into debugging it, I'm going to revert the relevant parts of halide#7746 so that it runs on the less-finicky Linux runners instead. * bogus * Update Generator.cpp * Update Generator.cpp * call copy_to_host before serializing buffers * throw an error if we serialize on-device buffer * Skip specialize_to_gpu * Update Pipeline.cpp * Skip two more tests * use serialize to memory during jit testing * makefile update * makefile fix * skip the tutorial if flatc is not there * fix * fix signature * fix makefile * trigger buildbot --------- Co-authored-by: Steven Johnson <[email protected]>
relates #7745
As discussed in dev meeting, I add back the JIT roundtrip test. It works by intercepting the
compile_jit
call, doing a serialization roundtrip inside it and using serialized results to do the compilation.Because its rather hacky, the code is included in #ifdef blocks, and a compile option
WITH_SERIALIZATION_JIT
is added, this should be a test-only option that builds a special version of Halide that have no use other than serialization testing. Under this special build, correctness test suite becomes serialization correctness test.Welcome all thoughts of comments and feedbacks @steven-johnson @abadams @derek-gerstmann