Best practices / questions when running different nextflow pipelines concurrently on HPC compute cluster with shared storage for scratch / workDir #4584

flipfloptech · 2023-12-14T13:50:28Z

flipfloptech
Dec 14, 2023

I recently had been tasked with troubleshooting an issue with a nextflow pipeline on our HPC compute cluster that then opened up a question.

Is it advisable to run multiple nextflow pipelines on a HPC cluster, all using the same workDir/scratch root path? ie: /data/hot/scratch/nextflow/work ?

It seems that generally speaking there should not be any folder naming conflicts/collisions, but it also seems the resume functionality is based off the assumption of that directory primarily only being for one pipeline?

What if any are the best practices for this? We have a cron job that "cleans up" scratch space, removing anything older then 7 days, but my concern is that the larger and more pipelines that run concurrently may cause issues in the future.

Should we be utilizing individual sub directories per execution? per pipeline?

Our current issue was that our cleanup script, sometimes delete's an empty workDir that the pipeline thinks exist right before the submitTask call in checkCachedOrLaunchTask but then fails to create .command.run when the task goes to execute since the directory no longer is there.

Answered by bentsherman

Dec 14, 2023

Concurrent pipeline runs are guaranteed to not collide with each other because they each have a unique session ID which is included in the task hash (i.e. task work directory name). Personally I like using a single work directory for all my pipelines, especially in scratch storage with a cleanup policy, then I can set it and forget about it.

The important thing for your cleanup policy is that (1) the max retention is greater than the total walltime of most pipeline runs in your cluster and (2) the policy actually only deletes files based on age. It sounds like your policy is deleting empty directories regardless of age, which could be a problem in the exact scenario that you mentioned.

View full answer

bentsherman · 2023-12-14T14:26:38Z

bentsherman
Dec 14, 2023
Maintainer

Concurrent pipeline runs are guaranteed to not collide with each other because they each have a unique session ID which is included in the task hash (i.e. task work directory name). Personally I like using a single work directory for all my pipelines, especially in scratch storage with a cleanup policy, then I can set it and forget about it.

The important thing for your cleanup policy is that (1) the max retention is greater than the total walltime of most pipeline runs in your cluster and (2) the policy actually only deletes files based on age. It sounds like your policy is deleting empty directories regardless of age, which could be a problem in the exact scenario that you mentioned.

1 reply

flipfloptech Dec 14, 2023
Author

This was exactly what I was looking for, thank you @bentsherman for the clarification.

jjfarrell · 2024-01-15T16:47:42Z

jjfarrell
Jan 15, 2024

When publishing the pipeliine files using publishDir from a work dir in scratch, it is also important to specify copy mode rather than the default link since the files in the scratch will eventually be deleted. By default files are published to the target folder creating a symbolic link for each process output that links the file produced into the process working directory.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices / questions when running different nextflow pipelines concurrently on HPC compute cluster with shared storage for scratch / workDir #4584

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Best practices / questions when running different nextflow pipelines concurrently on HPC compute cluster with shared storage for scratch / workDir #4584

flipfloptech Dec 14, 2023

Replies: 2 comments · 1 reply

bentsherman Dec 14, 2023 Maintainer

flipfloptech Dec 14, 2023 Author

jjfarrell Jan 15, 2024

flipfloptech
Dec 14, 2023

Replies: 2 comments 1 reply

bentsherman
Dec 14, 2023
Maintainer

flipfloptech Dec 14, 2023
Author

jjfarrell
Jan 15, 2024