Use new imglib2 parallelization approach #83

maarzt · 2019-09-19T15:20:04Z

In this PR I change imglib2-algorithm to use the parallelization framework introduced in imglib/imglib2#269! I want to find out it the parallelization framework is useful. And that's the result:

Yes! 🎉 All multi-threading in imglib2-algorithm can be replaced by the parallelization framework.

Most multi-threading code can be replaced with multi-threaded LoopBuilder or IterableLoopBuilder. This also leads to extreme simplification of the code.
IterableLoopBuilder is part of this PR, it's like LoopBuilder but has an IterableInterval as first argument.

Some methods, that where previously single threaded, now run multi-threaded. It's still possible to execute them single threaded, use Parallelization.runSingleThreaded( () -> ... ). We might either just stick with this. Or give multi-threaded functions a different name. And provide a backward function.
These affected methods are:

ConnectedComponents.labelAllConnectedComponents( final RandomAccessible< T > input, final ImgLabeling< L, I > labeling, final Iterator< L > labelGenerator, final StructuringElement se )
ConnectedComponents.labelAllConnectedComponents(final RandomAccessible< T > input, final RandomAccessibleInterval< L > output, final StructuringElement se )
TensorEigenValues.calculateEigenValuesSymmetric(final RandomAccessibleInterval< T > tensor, final RandomAccessibleInterval< U > eigenvalues )
TensorEigenValues.calculateEigenValuesSquare(final RandomAccessibleInterval< T > tensor,	final RandomAccessibleInterval< U > eigenvalues )
TensorEigenValues.calculateEigenValues(final RandomAccessibleInterval< T > tensor, final RandomAccessibleInterval< U > eigenvalues, final EigenValues< T, U > ev )
LocalExtrema.findLocalExtrema(final RandomAccessibleInterval< T > source, final LocalNeighborhoodCheck< P, T > localNeighborhoodCheck )
LocalExterma.findLocalExtrema(final RandomAccessibleInterval< T > source, final LocalNeighborhoodCheck< P, T > localNeighborhoodCheck, final Shape shape )
LocalExtrema.findLocalExtrema(final RandomAccessible< T > source,final Interval interval,final LocalNeighborhoodCheck< P, T > localNeighborhoodCheck )
LocalExtrema.findLocalExtrema(final RandomAccessible< T > source,final Interval interval,final LocalNeighborhoodCheck< P, T > localNeighborhoodCheck,final Shape shape )
SubpixelLocalization.refinePeaks(final List< P > peaks, final RandomAccessible< T > img, final Interval validInterval, final boolean returnInvalidPeaks,final int maxNumMoves, final boolean allowMaximaTolerance, final float maximaTolerance, final boolean[] allowedToMoveInDim )
DistanceTransform.transform(...) (6 diffenrent versions)
DistanceTransform.binaryTransform(...) (5 different versions)

The work isn't finished yet: classes Erosion, Dilation, Opening, Closing, TopHat, BlackTopHat could have methods without the numThread parameter.

Overall this PR shows that imglib2-algorithm can be greatly simplified using the new parallelization framework. It reduces code duplication, because for many algorithms there is a single-threaded and multi-threaded version, which is no longer needed.

ctrueden · 2019-09-21T06:49:13Z

src/main/java/net/imglib2/algorithm/convolution/Convolution.java

@@ -34,6 +34,7 @@
 	/**
 	 * Set the {@link ExecutorService} to be used for convolution.
 	 */
+	@Deprecated


Without having studied the code too closely: does this work? I.e., if some existing downstream code is using setExecutor, won't that executor now be ignored, instead of used when executing? Or did I miss where this is handled in some descendant class?

This method now works again.

tpietzsch · 2019-09-21T11:51:04Z

I didn't look at it in detail yet, but I think the IterableLoopBuilder belongs in core and should be a separate PR

imagejan · 2020-11-04T09:23:46Z

What's the state of this PR? Can we have IterableLoopBuilder in imglib2 core?

maarzt · 2021-11-10T10:27:32Z

This PR requires imglib/imglib2#310 (IterableLoopBuilder) to be merged an released first.

maarzt · 2021-11-10T13:15:35Z

The imglib2 Parallelization context + multithreaded LoopBuilder allow to conveniently implement multi-threading into an imglib2 algorithms. (As shown in this PR) There's no need to have ExecutorService or numTasks parameter. The method signatures remain clean. But this raises an important question? How to deal with existing single threaded methods, should we make the multi-threaded?

Possible answers:

Make these methods multi-threaded. They can still be executed single-threaded by using Parallelization.runSingleThreaded(...). Consequence: Exiting code will be multi-threaded, and by default run faster. But also some existing code might run slower because of the multi-threading overhead or if too much multi-threading happens in parallel. Additionally some code with thread-unsafe RandomAccesses might break.
Keep the existing methods single-threaded. Add multi-threaded alternatives. Consequence: Nothing breaks. Performance stay constant. But also no speed up for existing code. People might not use the multi-threaded alternatives, because they are not aware of them.

@tpietzsch @axtimwalde @StephanPreibisch What are your thoughts?

tpietzsch · 2021-11-11T09:33:12Z

Also discussed gitter: https://gitter.im/imglib/imglib2?at=618bd07238377967f4a42c73

copy&paste from there:

Curtis Rueden @ctrueden Nov 10 16:10
@maarzt Awesome, thanks for all your work on this!
For what to do with existing algorithms: I think it depends on whether multithreading them would change the result.
If you arrive at the same deterministic answer as before, then I vote to definitely switch them over to multithreading.
If the result changes, but is still deterministic, you could add new method sig(s) and deprecate the old one(s).
If the result of multithreading it becomes non-deterministic, that's a problem for Ops and probably others using the algorithm, so we should be careful there.

Jean-Yves Tinevez @tinevez Nov 10 16:51
Awesome !

Tobias Pietzsch @tpietzsch Nov 10 17:36
I would also change to multi-threading by default. (or rather using current Parallelization context by default). I would prehaps keep the single-threaded versions under an explicit name (e.g. PartialDerivative.gradientForwardDifferenceSingleThreaded) for things that are possibly run on tiny things where even querying the Parallelization ThreadLocal could be considered too much overhead. These "kernel methods" are perhaps anyway used as building blocks for the multi-threaded versions, and I don't see a big downside to exposing them.

Tobias Pietzsch @tpietzsch Nov 10 17:41
But multi-threaded by default is good
@axtimwalde @StephanPreibisch opinions?

axtimwalde · 2021-11-11T23:53:02Z

Auto-multi-threading can be bad when parallelizing things with Spark. The Parallelization context will have unintended consequences when several tasks run in the same JVM (which is typical). Am I overthinking this?

tpietzsch · 2021-11-12T05:36:21Z

Auto-multi-threading can be bad when parallelizing things with Spark. The Parallelization context will have unintended consequences when several tasks run in the same JVM (which is typical). Am I overthinking this?

Maybe I'm "underthinking" it, but wouldn't it be sufficient to then just use Parallelization.runSingleThreaded(...)around each task?

maarzt · 2021-11-12T12:55:06Z

Hi @axtimwalde, I hope the parallelization context will be useful also for your code that runs on Spark. It is actually intended for those use cases. Where a part of the code should run single threaded. And no worries it's thread safe. It uses ThreadLocal to store one configuration per thread.

As Tobi suggested, all you need to do is to wrap the tasks that you execute one Spark into a call of Parallelization.runSingleThreaded(...). I hope it will for you. We otherwise need to fix the Parallelization context. I'm happy to look at any example where it breaks, or where you think it would break.

axtimwalde · 2021-11-12T15:48:08Z

May be I am not understanding this well enough. However, Parallelization.runSingleThreaded(...) calls TaskExecutors.singleThreaded(...) which returns a singleton executor for the entire JVM. That would mean that all parallel Spark tasks running on the same JVM now run sequentially instead of in parallel. This would not be right. It looks like Spark's and Parallelization's task management are trying to do the same thing in incompatible ways. But again, may be I am misinterpreting something here, haven't spent enough time, sorry.

maarzt · 2021-11-12T22:20:56Z

However, Parallelization.runSingleThreaded(...) calls TaskExecutors.singleThreaded(...) which returns a singleton executor for the entire JVM.

Yes that is correct. TaskExecutors.singleThreaded() returns a singleton executor. This executor is the only instance of the class SequentialTaskExecutor. But this won't cause any trouble. The class SequentialTaskExecutor has no fields and therefor no internal state or state changes. One could have multiple instances of SequentialTaskExecutor, it wouldn't make no difference. They would all behave exactly the same at any time... I wanted to make the overhead of calling TaskExecutors.singleThreaded() as small as possible. Returning an singleton is faster than creating new instances.

By the way, the same is true for TaskExecutors.multiThreaded() and the respective ForkJoinExecutorService. There is always only one instance, no fields, no internal state, no need to create multiple instances.

That would mean that all parallel Spark tasks running on the same JVM now run sequentially instead of in parallel.

No. The SequentialTaskExecutor isn't that powerful. Let us assume Spark is configured to start 8 threads in a JVM. Calling Parallelization.runSingleThreaded(...) won't change anything about this. All it does is preventing the imglib2 algorithms from spawning new threads. And I assume that's exactly what you need.

ctrueden · 2022-11-15T19:16:30Z

@maarzt @tpietzsch @axtimwalde Is this PR still relevant? It would be great to get this library using the unified multithreading approach!

axtimwalde · 2022-11-15T19:42:27Z

I think this is still relevant and looks good if running tasks SequentialTaskExecutor.getInstance() from multiple threads will not force-sequentialize them.

maarzt · 2022-11-17T16:56:25Z

I will go over this PR, check everything, and prepare it to be merged.

This simplifies the existing implementation while staying backwards compatible. The code adds a new method: Thresholder.threshold( source, threshold, above ) That is multi-threaded according to the Parallelization context.

…lib2 Parallelization context Add new methods DifferenceOfGaussian.DoG that don't require an ExecutorService but use the Parallelization context. Replace the old implementation with multi-threaded LoopBuilder.

No new methods are added. Existing methods behave the same, but can be configured using the Parallelization context.

Simplify the code by using multi-threaded LoopBuilder. Many existing methods become multi-threaded by default.

maarzt · 2022-12-07T13:42:02Z

Ok, I added tests and updated the javadoc. I would say, this PR is ready to be merged!

ctrueden reviewed Sep 21, 2019

View reviewed changes

maarzt force-pushed the parallel branch from 7626dbd to a980129 Compare June 5, 2020 14:52

imagejan mentioned this pull request Nov 4, 2020

mutual information and related methods #89

Open

maarzt mentioned this pull request Nov 4, 2021

Add IterableLoopBuilder imglib/imglib2#310

Merged

maarzt force-pushed the parallel branch from a980129 to 0171e91 Compare November 10, 2021 10:22

maarzt force-pushed the parallel branch from 0171e91 to 97a461f Compare December 2, 2022 10:49

maarzt added 11 commits December 6, 2022 15:30

Add utility class to simplify the creation of test images

630a8cb

Add unit test for class Thresholder

31f054a

Add unit test for class DifferenceOfGaussian

6fdf423

Refactor ConnectedComponentsTest and test more methods

481bf0f

Add unit tests for Dilation and Erosion

f5ff37a

Add unit test for class MorphologyUtils

bfaa8bd

Add unit test for class ComputeMinMax

6a4f3a2

POM: pin imglib2 version to 5.13.0

3184e2f

Change Thresholder implementation to multi-threaded LoopBuilder

ac3b901

This simplifies the existing implementation while staying backwards compatible. The code adds a new method: Thresholder.threshold( source, threshold, above ) That is multi-threaded according to the Parallelization context.

Update DifferenceOfGaussian to use multi-threaded LoopBuilder and img…

8397118

…lib2 Parallelization context Add new methods DifferenceOfGaussian.DoG that don't require an ExecutorService but use the Parallelization context. Replace the old implementation with multi-threaded LoopBuilder.

Update ConntectedComponenets to make use of the Parallelization context.

7e28716

No new methods are added. Existing methods behave the same, but can be configured using the Parallelization context.

maarzt added 10 commits December 6, 2022 15:34

Change TensorEigenValues to use multi-threaded LoopBuilder.

791c47f

Simplify the code by using multi-threaded LoopBuilder. Many existing methods become multi-threaded by default.

Change Dilation to use IterableLoopBuilder

4873a47

Change Erosion to use IterableLoopBuilder

f294d06

Update MorphologyUtils to use Parallelization framework

ddba38a

Update DistanceTransform to use parallelization framework

fe79678

Chang PartialDerivative to use multi-threaded LoopBuilder

3dd9ecd

Update LocalExtrema to use parallelization framework

23ea534

Change SubpixelLocalization to use the Parallelization framework

7fe1eaf

Update ComputeMinMax to use parallelization framework

5c81783

Add benchmark for LocalExtrema

6d4f829

maarzt force-pushed the parallel branch from 97a461f to 6d4f829 Compare December 6, 2022 15:23

maarzt requested a review from ctrueden December 7, 2022 13:42

ctrueden requested review from tpietzsch and axtimwalde February 5, 2023 04:35

maarzt marked this pull request as ready for review March 10, 2023 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use new imglib2 parallelization approach #83

Use new imglib2 parallelization approach #83

maarzt commented Sep 19, 2019

ctrueden Sep 21, 2019

maarzt Dec 7, 2022

tpietzsch commented Sep 21, 2019

imagejan commented Nov 4, 2020

maarzt commented Nov 10, 2021

maarzt commented Nov 10, 2021

tpietzsch commented Nov 11, 2021

axtimwalde commented Nov 11, 2021

tpietzsch commented Nov 12, 2021

maarzt commented Nov 12, 2021

axtimwalde commented Nov 12, 2021

maarzt commented Nov 12, 2021

ctrueden commented Nov 15, 2022

axtimwalde commented Nov 15, 2022

maarzt commented Nov 17, 2022

maarzt commented Dec 7, 2022

Use new imglib2 parallelization approach #83

Are you sure you want to change the base?

Use new imglib2 parallelization approach #83

Conversation

maarzt commented Sep 19, 2019

ctrueden Sep 21, 2019

Choose a reason for hiding this comment

maarzt Dec 7, 2022

Choose a reason for hiding this comment

tpietzsch commented Sep 21, 2019

imagejan commented Nov 4, 2020

maarzt commented Nov 10, 2021

maarzt commented Nov 10, 2021

tpietzsch commented Nov 11, 2021

axtimwalde commented Nov 11, 2021

tpietzsch commented Nov 12, 2021

maarzt commented Nov 12, 2021

axtimwalde commented Nov 12, 2021

maarzt commented Nov 12, 2021

ctrueden commented Nov 15, 2022

axtimwalde commented Nov 15, 2022

maarzt commented Nov 17, 2022

maarzt commented Dec 7, 2022