Replies: 8 comments 12 replies
-
Yes, this is ugly. Which nextflow version are you using? |
Beta Was this translation helpful? Give feedback.
-
I'll add that I've just ran into this issue. It's been trying to publish 1 file (to S3) for the last 8 hours. This is a file that's only ~20 GB in size as I've run the pipeline on this exact same data before. I don't know if I should just kill and rerun the pipeline or let it keep going. |
Beta Was this translation helpful? Give feedback.
-
Hi all, I run into the same issue when using nextflow version 22.10.0 in combination with azure batch executor. I also saw this line in my log files ; [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=36; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false Is there a fix for this issue or a way to update these settings? |
Beta Was this translation helpful? Give feedback.
-
An interesting data point... We seem to be hitting this when the output is a directory, but not individual file globs. For example, the following will fail consistently (and take HOURS to fail):
But this will succeed within minutes (it's the exact same set of files):
|
Beta Was this translation helpful? Give feedback.
-
I am also encountering the same issue running the pipeline on AWS batch. The same pipeline did not have this issue a few months ago. When the pipeline completes in about 2h and needs to transfer around 80GB total size of multiple files, I'm getting a message
However the run is still on going at 20h. I also tried changing outputs to individual file globs (which have worked for me in the other pipeline), but that did not seem to help here. |
Beta Was this translation helpful? Give feedback.
-
I am experiencing this same issue but am not writing to S3 (though am using an EC2 instance). My log message:
With no processes being run for the last 11 hours, though there are still more processes to run in this step. It looks like it finished process ~795/1328 before pausing. There are no files in the output directory, and definitely more than 7 that have been created in the work dir and need to be moved there (~795, to be exact). The files themselves aren't very large, most under a MB. I tried this before with a small test subset of samples and it worked just fine. I have plenty of space left on the disk. My nextflow version is 22.10.1, with DSL 2. |
Beta Was this translation helpful? Give feedback.
-
One thing you all can try is to enable virtual threads. See this blog post for details, but here's the gist:
Then virtual threads will be enabled automatically. I have done some benchmarks and found that this feature can significantly reduce the time to publish files at the end of the workflow, especially when copying from S3 to S3. Haven't done any benchmarks for Google Cloud Storage but there might be some benefit, worth trying in any case. |
Beta Was this translation helpful? Give feedback.
-
I've just run into this issue again, and have been facing it for the last few months. I tried switching to NF version However, this results in a OOM error from Java (see below). My processes don't even complete before being terminated.
|
Beta Was this translation helpful? Give feedback.
-
Hello,
I have been using Nextflow for a while now but primarily on AWS, recently I switched to using it on google cloud. I noticed that once the jobs are finished the pipeline takes a really long time to transfer/copy files from the work directory it ran to the final destination folder. The work directory is in the same bucket as the final destination as well.
For reference, here is my publishdir directive:
publishDir params.alignments_outdir, mode: "copy"
And the log statements to highlight how long it took:
For a total of ~11 hours. The biggest file in this group of files is about 6.3 GB and there is 154 of those and the rest are small .json files, a few MB each. The total file size being transferred is about a little less than 1 TB.
I have transferred that much data between folders in a bucket on google cloud before using the gsutil command and it takes nowhere near as much time. For example using a vm on google cloud to transfer files between 2 folders in the same bucket:
[1.4 GiB/ 2.1 TiB] 32% Done 3.3 GiB/s ETA 00:07:16
using the command
gsutil -m cp -r
so it should be much faster than the whopping 11 hours it took. I also believe that nextflow uses the gsutil cp command as well.
I was wondering maybe if there is a setting I am missing? I have looked around for solutions to this specific issue on gcloud but haven't found a solution that works. Here is my nextflow.config gls profile
gls { process.executor = "google-lifesciences" docker.enabled = true google.location = "us-west2" google.region = "us-west1" google.lifeSciences.cpuPlatform = "Intel Skylake" google.lifeSciences.bootDiskSize = "100.GB" google.storage.parallelThreadCount = 100 google.storage.maxParallelTransfers = 100 }
I was looking through the logs and I found this statement and I was wondering if its slow because it's limited to 4 transfers at a time but I cant find a setting to increase it to check. I added the "parallelThreadCount" and "maxParallelTransfers" options to my config but neither of these seem to change the filetransfer queue size.
Jul-14 03:48:18.471 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=4; maxSize=4; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Beta Was this translation helpful? Give feedback.
All reactions