Support a parallel fold variant for the order preservation of final data processing results #113

elfring · 2023-10-14T15:55:19Z

The following information is provided in the section “Preservation of output order in Parmap”.

No reordering logic is implemented for parmapfold, parfold and their variants,
🔮 Will the chances grow to adjust this software limitation?
as performing these operations in parallel only make sense if the order is irrelevant.
💭 I suggest to reconsider this view.
- The size and difficulty can vary for data processing tasks.
- Thus it can happen that such tasks are finished with different run times while the desired progress is also influenced by resource allocation and usage (according to the operating system).
Would you like to become able to collect computed results with a known order? 🤔

The text was updated successfully, but these errors were encountered:

rdicosmo · 2023-10-15T15:01:47Z

Hi @elfring, I fear reading this issue that the documentation you are pointing at is confusing, so it is that documentation that should be changed, and not the code.

Let me try to explain and check if you agree: the original implementation of the map functions did not preserve the order, to maximise efficiency, so when using the chunksize parameter something like parmap (fun x -> x+1) [1;2;3;4;5] may return something like [6;2;3;4;5] instead of the expected [2;3;4;5;6]. This is why there is now a keeporder parameter that forces the result of the map function to be in the same order as the input (at some extra price in terms of execution time) even when chunksize is used.

Fold operations are different: the output of the function is the result of aggregating the input values into a single element, so, "preserving the input order" does not mean the same as for a map function. The only "order preservation" would be the order in which one accumulates the elements of the input during the computation. Now, when one implements something like List.fold_right (+) l 0 in parallel, the order in which the different elements are "accumulated" to produce the final result is necessarily not sequential (otherwise there is nothing you can gain going in parallel), so it is well known that something like a parfold only make sense if the folding operator (here, the +) is commutative and associative. This is what the documentation tries to say.

Unfortunately, it seems that as it is written, it may give the impression that one gives up parallelism. This is not the case: the implementation of the fold operations is indeed parallel and uses load balancing, as it is implemented via the general mapper function (see https://github.com/rdicosmo/parmap/blob/master/src/parmap.ml#L525), so performance is not an issue here, you already get the best ;-)

Maybe the best way forward is just to remove from the documentation the paragraph "No reordering logic is implemented for parmapfold, parfold and their variants, as performing these operations in parallel only make sense if the order is irrelevant." altogether?

elfring · 2023-10-15T17:16:12Z

Fold operations are different: the output of the function is the result of aggregating the input values into a single element, …

💭 It seems that we stumble on general communication difficulties.
🔮 Can the “preservation of an output order” be occasionally also desirable for parallel fold operations?
Which algorithm would be chosen for the combination of final data processing results? 🤔

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support a parallel fold variant for the order preservation of final data processing results #113

Support a parallel fold variant for the order preservation of final data processing results #113

elfring commented Oct 14, 2023

rdicosmo commented Oct 15, 2023

elfring commented Oct 15, 2023

Support a parallel fold variant for the order preservation of final data processing results #113

Support a parallel fold variant for the order preservation of final data processing results #113

Comments

elfring commented Oct 14, 2023

rdicosmo commented Oct 15, 2023

elfring commented Oct 15, 2023