-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new imglib2 parallelization approach #83
base: master
Are you sure you want to change the base?
Conversation
@@ -34,6 +34,7 @@ | |||
/** | |||
* Set the {@link ExecutorService} to be used for convolution. | |||
*/ | |||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without having studied the code too closely: does this work? I.e., if some existing downstream code is using setExecutor
, won't that executor now be ignored, instead of used when executing? Or did I miss where this is handled in some descendant class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method now works again.
I didn't look at it in detail yet, but I think the |
What's the state of this PR? Can we have |
This PR requires imglib/imglib2#310 (IterableLoopBuilder) to be merged an released first. |
The imglib2 Parallelization context + multithreaded LoopBuilder allow to conveniently implement multi-threading into an imglib2 algorithms. (As shown in this PR) There's no need to have ExecutorService or numTasks parameter. The method signatures remain clean. But this raises an important question? How to deal with existing single threaded methods, should we make the multi-threaded? Possible answers:
@tpietzsch @axtimwalde @StephanPreibisch What are your thoughts? |
Also discussed gitter: https://gitter.im/imglib/imglib2?at=618bd07238377967f4a42c73 copy&paste from there: Curtis Rueden @ctrueden Nov 10 16:10 Jean-Yves Tinevez @tinevez Nov 10 16:51 Tobias Pietzsch @tpietzsch Nov 10 17:36 Tobias Pietzsch @tpietzsch Nov 10 17:41 |
Auto-multi-threading can be bad when parallelizing things with Spark. The Parallelization context will have unintended consequences when several tasks run in the same JVM (which is typical). Am I overthinking this? |
Maybe I'm "underthinking" it, but wouldn't it be sufficient to then just use |
Hi @axtimwalde, I hope the parallelization context will be useful also for your code that runs on Spark. It is actually intended for those use cases. Where a part of the code should run single threaded. And no worries it's thread safe. It uses ThreadLocal to store one configuration per thread. As Tobi suggested, all you need to do is to wrap the tasks that you execute one Spark into a call of |
May be I am not understanding this well enough. However, |
Yes that is correct. TaskExecutors.singleThreaded() returns a singleton executor. This executor is the only instance of the class SequentialTaskExecutor. But this won't cause any trouble. The class SequentialTaskExecutor has no fields and therefor no internal state or state changes. One could have multiple instances of SequentialTaskExecutor, it wouldn't make no difference. They would all behave exactly the same at any time... I wanted to make the overhead of calling TaskExecutors.singleThreaded() as small as possible. Returning an singleton is faster than creating new instances. By the way, the same is true for TaskExecutors.multiThreaded() and the respective ForkJoinExecutorService. There is always only one instance, no fields, no internal state, no need to create multiple instances.
No. The SequentialTaskExecutor isn't that powerful. Let us assume Spark is configured to start 8 threads in a JVM. Calling Parallelization.runSingleThreaded(...) won't change anything about this. All it does is preventing the imglib2 algorithms from spawning new threads. And I assume that's exactly what you need. |
@maarzt @tpietzsch @axtimwalde Is this PR still relevant? It would be great to get this library using the unified multithreading approach! |
I think this is still relevant and looks good if running tasks |
I will go over this PR, check everything, and prepare it to be merged. |
This simplifies the existing implementation while staying backwards compatible. The code adds a new method: Thresholder.threshold( source, threshold, above ) That is multi-threaded according to the Parallelization context.
…lib2 Parallelization context Add new methods DifferenceOfGaussian.DoG that don't require an ExecutorService but use the Parallelization context. Replace the old implementation with multi-threaded LoopBuilder.
No new methods are added. Existing methods behave the same, but can be configured using the Parallelization context.
Simplify the code by using multi-threaded LoopBuilder. Many existing methods become multi-threaded by default.
Ok, I added tests and updated the javadoc. I would say, this PR is ready to be merged! |
In this PR I change imglib2-algorithm to use the parallelization framework introduced in imglib/imglib2#269! I want to find out it the parallelization framework is useful. And that's the result:
Yes! 🎉 All multi-threading in imglib2-algorithm can be replaced by the parallelization framework.
Most multi-threading code can be replaced with multi-threaded
LoopBuilder
orIterableLoopBuilder
. This also leads to extreme simplification of the code.IterableLoopBuilder
is part of this PR, it's likeLoopBuilder
but has anIterableInterval
as first argument.Some methods, that where previously single threaded, now run multi-threaded. It's still possible to execute them single threaded, use
Parallelization.runSingleThreaded( () -> ... )
. We might either just stick with this. Or give multi-threaded functions a different name. And provide a backward function.These affected methods are:
The work isn't finished yet: classes Erosion, Dilation, Opening, Closing, TopHat, BlackTopHat could have methods without the numThread parameter.
Overall this PR shows that imglib2-algorithm can be greatly simplified using the new parallelization framework. It reduces code duplication, because for many algorithms there is a single-threaded and multi-threaded version, which is no longer needed.