Replies: 7 comments
-
Within one frame, the atomic energies are computed in a parallelized way, i.e. the energy contribution from different atoms are evaluated in parallel by multiple threads. The neural network part is implemented by the Tensorflow, while the descriptor part is implemented by the deepmd-kit. By default, Tensorflow tries to use all threads available on one cluster node. Could you please provide more details on how did you control the number of threads during the training? Did you set environmental variable OMP_NUM_THREADS, and use the command line option --inter-threads? |
Beta Was this translation helpful? Give feedback.
-
Thanks for your reply. |
Beta Was this translation helpful? Give feedback.
-
It can be helpful, the training I'm trying to run is something like:
"stop_batch": 1000000, Thanks again |
Beta Was this translation helpful? Give feedback.
-
The OMP_NUM_THREDS and -t set the intra_op_parallelism_threads and inter_op_parallelism_threads parameters when creating a Tensorflow session. I expected the total number of threads used during training is limited to intra_op_parallelism_threads*inter_op_parallelism_threads. Then I had a test on my laptop that has 4 physical CPU cores... Thus the intra_op_parallelism_threads and inter_op_parallelism_threads together cannot really limit the total number of threads used during the training. This may be the reason why you do not see any difference in performance by changing these parameters. I searched a little bit as saw the issues reported to TF development team, and it seems that there is no good solution |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for you help. Best, FIlippo |
Beta Was this translation helpful? Give feedback.
-
deepmd-kit set the "inter_op_parallelism_threads" in a wrong way. Please take a look at the bug fixing df917f6 at brach master. Now the CPU resource used to parallize the computation is correctly controlled. Notice that one still cannot set the exact number of threads. You are refered to Best, |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for the clarification and for addressing me to the bug fixing. |
Beta Was this translation helpful? Give feedback.
-
Hello to everybody!
I would like to kindly ask a clarification on how practically parallelisation works within deepMD training, to the experts. What is not exactly clear to me is where parallelisation enters in the game. If I have, say, 2500 training batches containing 2 frames each, would would evaluate the loss function ideally on 2 structures in parallel right? I'm asking this because with these very same condition, using 2 OpenMP threads on a 16 CPUs cluster node, if I increase the batch size to 4 structures and use 4 OpenMP threads it doesn't seem to show any scaling (takes like double the time whereas I was expecting it to take about the same time). I don't understand where I'm wrong in interpreting this problem.
Thanks for your time.
All the best.
Beta Was this translation helpful? Give feedback.
All reactions