You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
see ComputationalRadiationPhysics#51
Some observations:
- we have only trivial (and fast) loops
- the other loops are integral steps of the simulation and can not be
parallelized (sequential steps and maybe with device-code)
- one of the parsing loops uses cudaSetDevice, not sure if it's
possible to parallelize that in a good way.
- parallelizing std::vector is ok as long as the length is fixed (no
reallocation). That means, we may not use vector.push_back() or
vector.insert() inside a loop with OpenMP pragmas. Compiler might
not complain...
Only "easy" loops were parallelized, only basic pragmas were used.
Might give some speedup one day, but if not... no problem. Pragmas and
code changes are non-intrusive enough to keep it maintainable.
There might be some loops that can pose a bottleneck. They might be rather easily parallelized with OpenMP (cmake-file will need to be tweaked)
The text was updated successfully, but these errors were encountered: