-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[proposal] Move some publishDir
attributes to process
-level directives
#4661
Comments
// process directive
process FASTQC {
publish 'results', mode: 'copy', saveAs: { ... }
// ...
}
// config option
process {
publish.path = 'results'
publish.mode = 'copy'
publish.saveAs = { ... }
} It is essentially giving up the ability to target specific outputs differently, instead the setting would apply to the entire process. It seems to be what people do almost every time, but I would like to investigate it more. But I do suggest making a new process directive, even something as simple as |
I don't really follow what the difference is here.. You mean that with the current
Yup, absolutely - good point 👍🏻 |
I was thinking of the |
it's a -1 for me, we are going to deprecate |
@pditommaso do you think the publish mode i.e. copy vs symlink should be defined in the output schema? |
I'd see a top level (default) setting, there's no really need to configure task by task |
I could live with that, although I have a feeling there are users out there who are setting it differently per process |
Definitely, I know of at least one use case for having The other option in that issue is Setting them as defaults at top-level that can be overwritten with process-specific config would be great though. That's basically what this issue is all about ☝🏻 But when it's a single directive, I don't think that individual attributes can be set as default / overwritten like that, which is why I was proposing to split them up. Am I wrong about that? |
I would see value in also having an upgraded My impression is that having the schema as the sole way to specify outputs (and inputs) can constitute a barrier to adoption for new users. Beginners and simple pipelines would benefit greatly from having a fall-back |
@ewels My take is that these directives are needed because you still continue to think in We need to move away from this model where the workflow output is buried in the process definition and then there's a huge config file to control all these settings. I don't exclude that some of these settings can become a directive, but we have to see case by case to address were specific needs. @marcodelapierre good point, it should not definitely become a mandatory requirement to create a schema file to be able to have your workflow writing some output file. I agree it would create too much friction. |
Agreed - the above proposal would basically be a stop-gap: hopefully quite easy to implement / a minor change with immediate benefit. Then it can be replaced by a new system when it's ready. |
A stop-gap is foverever .. We need to move fast with #4670, instead. |
Closing in favor of #4784 |
Based on a comment originally posted in #4205 (comment)
publishDir
supports multiple attributes that control file publishing behaviour. The problem with having them structured this way is that if any need to be overwritten, then entirepublishDir
statement has to be repeated. This leads to some attributes being repeated verbatim many times in pipeline configs.This issue is to suggest moving attributes out to new directives, so that they can be set to a single value for the entire pipeline. This reduces duplicated configuration code. Could be all current attributes, but main priorities (most duplicated) are:
path
(ideally split into two, with "base" and currentpath
attribute)mode
saveAs
If possible, this would be under a
publishDir
scope, though I don't know if this is possible technically.So ideally, it'd be something like taking the following example:
And it becoming:
The text was updated successfully, but these errors were encountered: