Class-based process inputs #4912
Midnighter
started this conversation in
Ideas
Replies: 2 comments 4 replies
-
Already on it: #4553 Would love to hear your feedback. If you end up testing it, post your feedback here instead of the PR and we can discuss |
Beta Was this translation helpful? Give feedback.
3 replies
-
@bentsherman I couldn't tell from the description of your PR, did you find a way of combining record input with input path options? I rarely need it, but once in a while it's extremely useful to stage files in a subdirectory. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been mulling over some thoughts for a while, so I wanted to share and see what folks think.
If I saw in source code that the same parameters are used in different functions
over and over again, I call that an anti pattern and would suggest to combine
those into a meaningful object. Unfortunately, this use is extremely common in
nextflow and nf-core pipelines because we typically care about some files
representing biological data and their meta information. So something like the
following is very widely used.
input: tuple val(meta), path(pair_1), path(pair_2)
I would love to be able to instead define this as:
I see multiple advantages to this:
The object's name and attributes are much more descriptive, making it easier
to understand their use in processes.
Files and meta information are inseparable. That simplifies use in processes
and channel transformations. It is impossible to introduce bugs where meta
information are combined with the wrong file.
All the benefits of immutable data structures. We see many subtle bugs with
modified meta maps and issues with caching due to that.
It will be much simpler to support more complicated comparisons through class
based methods. Rather than comparing entire (mutable) maps or pulling out
multiple items from a map and comparing on those, we can rely on default
comparison and implement it on the class.
With typed process input, it would be possible to check attribute use in
processes at compile time rather than running into errors at runtime or
experiencing
null
values in processes. Something like:I'm sure we can find more advantages that I can't think of right now.
The big problem is, of course, that at the moment, nextflow will not stage any
of the files defined as attributes of such an object. However, perhaps a base
class could be implemented and exposed by nextflow, when any children of that
base class are used as process input, all of its attributes are checked and
files are staged appropriately.
Curious to hear what folks think. I know that we've discussed at least parts of
this with @robsyme.
Beta Was this translation helpful? Give feedback.
All reactions