Clarifications on how parallelisation works in DeepMD #19

fppsvz · 2019-02-27T16:02:08Z

fppsvz
Feb 27, 2019

Hello to everybody!
I would like to kindly ask a clarification on how practically parallelisation works within deepMD training, to the experts. What is not exactly clear to me is where parallelisation enters in the game. If I have, say, 2500 training batches containing 2 frames each, would would evaluate the loss function ideally on 2 structures in parallel right? I'm asking this because with these very same condition, using 2 OpenMP threads on a 16 CPUs cluster node, if I increase the batch size to 4 structures and use 4 OpenMP threads it doesn't seem to show any scaling (takes like double the time whereas I was expecting it to take about the same time). I don't understand where I'm wrong in interpreting this problem.
Thanks for your time.
All the best.

amcadmus · 2019-02-28T06:49:09Z

amcadmus
Feb 28, 2019
Maintainer

Within one frame, the atomic energies are computed in a parallelized way, i.e. the energy contribution from different atoms are evaluated in parallel by multiple threads. The neural network part is implemented by the Tensorflow, while the descriptor part is implemented by the deepmd-kit.

By default, Tensorflow tries to use all threads available on one cluster node. Could you please provide more details on how did you control the number of threads during the training? Did you set environmental variable OMP_NUM_THREADS, and use the command line option --inter-threads?

0 replies

fppsvz · 2019-02-28T09:39:42Z

fppsvz
Feb 28, 2019
Author

Thanks for your reply.
So for what I understand I should use a number of threads as large as the number of my physical cores, right? I'll try to explain my configuration and the "scaling tests" I've done.
I'm running deepmd-kit v0.12.2 in a Singularity container from created from the Docker container by frankhan91. I'm running the container on a cluster node with 16 cores.
I tried different runs changing the OMP_NUM_THREADS variable, number of frames in a batch and the '-t' command line option. I couldn't see any difference using the '-t' command line, used consistently with the '--cpus-per-task' slurm option. Although I could appreciate different training times changing the OMP_NUM_THREADS, in particular I found like an "optimum" training time with a '--cpus-per-task=16', OMP_NUM_THREADS=2, '-t 1', and "batch_size" = 16 configuration. All other configurations were detrimental in terms of training time (with more OMP_NUM_THREADS). I don't understand what I'm getting wrong here.
Thank you very much for your help.

0 replies

fppsvz · 2019-02-28T10:27:52Z

fppsvz
Feb 28, 2019
Author

It can be helpful, the training I'm trying to run is something like:

"use_smooth":       true,
"sel_a":            [20, 70, 30],
"rcut_smth":        6.80,
"rcut":             7,
"axis_rule":        [0, 2, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1],
"filter_neuron":            [40, 80],
"filter_resnet_dt":         false,
"axis_neuron":              4,
"fitting_neuron":           [160, 160, 160],

"stop_batch": 1000000,
"batch_size": 16,
"start_lr": 0.001,
"decay_steps": 10000,
"decay_rate": 0.97,
"start_pref_e": 0.01,
"limit_pref_e": 100,
"start_pref_f": 100,
"limit_pref_f": 0.01,
for some graphitic structures.

Thanks again

0 replies

amcadmus · 2019-03-01T05:54:32Z

amcadmus
Mar 1, 2019
Maintainer

The OMP_NUM_THREDS and -t set the intra_op_parallelism_threads and inter_op_parallelism_threads parameters when creating a Tensorflow session. I expected the total number of threads used during training is limited to intra_op_parallelism_threads*inter_op_parallelism_threads.

Then I had a test on my laptop that has 4 physical CPU cores...
intra_op_parallelism_threads=1 inter_op_parallelism_threads=1 dp_train starts 18 threads
intra_op_parallelism_threads=2 inter_op_parallelism_threads=1 dp_train starts 28 threads
intra_op_parallelism_threads=2 inter_op_parallelism_threads=2 dp_train starts 28 threads

Thus the intra_op_parallelism_threads and inter_op_parallelism_threads together cannot really limit the total number of threads used during the training. This may be the reason why you do not see any difference in performance by changing these parameters.

I searched a little bit as saw the issues reported to TF development team, and it seems that there is no good solution
tensorflow/tensorflow#14900
tensorflow/models#3176

0 replies

fppsvz · 2019-03-01T23:19:44Z

fppsvz
Mar 1, 2019
Author

Thank you very much for you help.
For what it concerns the fact that I see an increase in the training time increasing
OMP_NUM_THREDS to a value larger than 2, what can be due to?
I'm sorry if it's maybe a silly question, I can't think of a reason.

Best, FIlippo

0 replies

amcadmus · 2019-03-02T06:25:45Z

amcadmus
Mar 2, 2019
Maintainer

deepmd-kit set the "inter_op_parallelism_threads" in a wrong way. Please take a look at the bug fixing df917f6 at brach master. Now the CPU resource used to parallize the computation is correctly controlled. Notice that one still cannot set the exact number of threads.

You are refered to
https://stackoverflow.com/questions/41233635/meaning-of-inter-op-parallelism-threads-and-intra-op-parallelism-threads
for a very clear explanation on how the parameters controls the multithreading in Tensorflow.

Best,
Han

0 replies

fppsvz · 2019-03-05T13:15:04Z

fppsvz
Mar 5, 2019
Author

Thank you very much for the clarification and for addressing me to the bug fixing.
Best wishes,
Filippo

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifications on how parallelisation works in DeepMD #19

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Clarifications on how parallelisation works in DeepMD #19

fppsvz Feb 27, 2019

Replies: 7 comments

amcadmus Feb 28, 2019 Maintainer

fppsvz Feb 28, 2019 Author

fppsvz Feb 28, 2019 Author

amcadmus Mar 1, 2019 Maintainer

fppsvz Mar 1, 2019 Author

amcadmus Mar 2, 2019 Maintainer

fppsvz Mar 5, 2019 Author

fppsvz
Feb 27, 2019

amcadmus
Feb 28, 2019
Maintainer

fppsvz
Feb 28, 2019
Author

fppsvz
Feb 28, 2019
Author

amcadmus
Mar 1, 2019
Maintainer

fppsvz
Mar 1, 2019
Author

amcadmus
Mar 2, 2019
Maintainer

fppsvz
Mar 5, 2019
Author