-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added seqfu stats #56
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
Ultimately, the pipeline will allow to flexibly choose the tools that are run, so having more tools in the pipeline is always great, but what exactly was your rationale here?
Admittedly, I just skimmed over the description, but to me, it seems that it is predominantly meant to run on FASTA files and for judging the quality of genome assemblies. Unless I missed some relevant arguments, its application on sequencing reads is in my opinion does not really yield too many meaningful statistics (at least for Illumina, for Nanopore probably yes)
Do you happen to have a Nanopore example? For Illumina, this is basically all you get:
┌──────────────────────────┬──────────┬────────────┬───────┬─────┬─────┬─────┬───────┬─────┬─────┐
│ File │ #Seq │ Total bp │ Avg │ N50 │ N75 │ N90 │ auN │ Min │ Max │
├──────────────────────────┼──────────┼────────────┼───────┼─────┼─────┼─────┼───────┼─────┼─────┤
│ S11_L001_R1_001.fastq.gz │ 35875355 │ 5417178605 │ 151.0 │ 151 │ 151 │ 151 │ 0.000 │ 151 │ 151 │
│ S11_L001_I1_001.fastq.gz │ 35875355 │ 322878195 │ 9.0 │ 9 │ 9 │ 9 │ 0.000 │ 9 │ 9 │
│ S11_L001_R2_001.fastq.gz │ 35875355 │ 5417178605 │ 151.0 │ 151 │ 151 │ 151 │ 0.000 │ 151 │ 151 │
└──────────────────────────┴──────────┴────────────┴───────┴─────┴─────┴─────┴───────┴─────┴─────┘
But with regard to the code, this already looks very good and tidy!
I am only missing a config for the module with a publishDir
directive, so that the reports from the stats output channel are published in a subfolder of the outdir
. Furthermore, there are possibly relevant ext.args
that I overlooked and that you used for the reads?
Hi @MatthiasZepper, thank you for your comments! To be fair, I took up this issue/PR to "prepare" the pipeline with as many new modules as possible. I agree that this tool is more useful with Nanopore reads. Maybe this PR can be "frozen" until the seqinspector pipeline has a workflow for Nanopore/long read methods. Or until the pipeline accepts files that are not FASTQ (FASTA, BAM, BED, etc). Unfortunately, I don't have a Nanopore example, I'll see if there is some nanopore dataset on the test-datasets repo so I can see what the stats look like. I'll add now the config for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! LGTM!
Added SeqFu stats module
PR checklist
nf-core lint
).nf-test test main.nf.test -profile test,docker
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).