Best practices / questions when running different nextflow pipelines concurrently on HPC compute cluster with shared storage for scratch / workDir #4584
-
I recently had been tasked with troubleshooting an issue with a nextflow pipeline on our HPC compute cluster that then opened up a question. Is it advisable to run multiple nextflow pipelines on a HPC cluster, all using the same workDir/scratch root path? ie: /data/hot/scratch/nextflow/work ? It seems that generally speaking there should not be any folder naming conflicts/collisions, but it also seems the resume functionality is based off the assumption of that directory primarily only being for one pipeline? What if any are the best practices for this? We have a cron job that "cleans up" scratch space, removing anything older then 7 days, but my concern is that the larger and more pipelines that run concurrently may cause issues in the future. Should we be utilizing individual sub directories per execution? per pipeline? Our current issue was that our cleanup script, sometimes delete's an empty workDir that the pipeline thinks exist right before the submitTask call in checkCachedOrLaunchTask but then fails to create .command.run when the task goes to execute since the directory no longer is there. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Concurrent pipeline runs are guaranteed to not collide with each other because they each have a unique session ID which is included in the task hash (i.e. task work directory name). Personally I like using a single work directory for all my pipelines, especially in scratch storage with a cleanup policy, then I can set it and forget about it. The important thing for your cleanup policy is that (1) the max retention is greater than the total walltime of most pipeline runs in your cluster and (2) the policy actually only deletes files based on age. It sounds like your policy is deleting empty directories regardless of age, which could be a problem in the exact scenario that you mentioned. |
Beta Was this translation helpful? Give feedback.
-
When publishing the pipeliine files using publishDir from a work dir in scratch, it is also important to specify copy mode rather than the default link since the files in the scratch will eventually be deleted. By default files are published to the target folder creating a symbolic link for each process output that links the file produced into the process working directory. |
Beta Was this translation helpful? Give feedback.
Concurrent pipeline runs are guaranteed to not collide with each other because they each have a unique session ID which is included in the task hash (i.e. task work directory name). Personally I like using a single work directory for all my pipelines, especially in scratch storage with a cleanup policy, then I can set it and forget about it.
The important thing for your cleanup policy is that (1) the max retention is greater than the total walltime of most pipeline runs in your cluster and (2) the policy actually only deletes files based on age. It sounds like your policy is deleting empty directories regardless of age, which could be a problem in the exact scenario that you mentioned.