This repository has been archived by the owner on Nov 24, 2022. It is now read-only.
Implement parallel ahc-ld
/ahc-link
#621
Labels
ahc-ld
/ahc-link
#621
Is your feature request related to a problem? Please describe.
Recent performance improvements has resulted in over 50% improvement in wall clock time when linking large programs. For further improvements, we should introduce parallelism in
ahc-ld
andahc-link
. There are multiple chances of parallelism which is explained in the next section.Describe the solution you'd like
Chances of parallelism
ahc-ld
, we can parallelize the deserialization of each object file. All object files are converted toByteString
s first, either via direct reading orArchiveEntry
, then the deserialization can be performed in parallel.AsteriusModule
should be fully evaluated, and this can be done in parallel as well.binaryen
backend, we can parallelize the marshaling of different data segments and functions.binaryen
will transparently switch to a new allocator when it notices it's allocating an IR node on a different thread, so we should ensure each Haskell worker thread is pinned usingforkOn
.Method of parallelism
We cannot introduce additional dependencies like
parallel
,monad-par
orscheduler
here, since we need to strictly control our dependency surface. So we need to roll our minimal parallelism framework first.The need for nested parallelism can be avoided for our use cases. A simple parallel loop should be sufficient:
The first argument is the worker thread pool capacity, which should be equivalent to CPU core number.
In addition, we should implement a link-time option for
ahc-ld
/ahc-link
to allow overriding the worker thread pool size; specifying it to1
should fallback to sequential code to avoid threading overhead.The text was updated successfully, but these errors were encountered: