Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow output definition #275

Closed
wants to merge 17 commits into from
Closed
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions assets/schema_mappings.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
$schema: 'http://json-schema.org/draft-07/schema'
$id: 'https://raw.githubusercontent.com/nf-core/fetchngs/master/assets/schema_mappings.yml'
title: 'nf-core/fetchngs pipeline - id_mappings.csv schema'
description: 'Schema for the mappings file produced by fetchngs'
type: array
items:
type: object
properties:
sample:
type: string
experiment_accession:
type: string
run_accession:
type: string
sample_accession:
type: string
experiment_alias:
type: string
run_alias:
type: string
sample_alias:
type: string
experiment_title:
type: string
sample_title:
type: string
sample_description:
type: string
81 changes: 81 additions & 0 deletions assets/schema_samplesheet.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
$schema: 'http://json-schema.org/draft-07/schema'
$id: 'https://raw.githubusercontent.com/nf-core/fetchngs/master/assets/schema_mappings.yml'
title: 'nf-core/fetchngs pipeline - samplesheet.csv schema'
description: 'Schema for the samplesheet file produced by fetchngs'
type: array
items:
type: object
properties:
sample:
type: string
fastq_1:
type: string
format: file-path
pattern: '^\\S+\\.f(ast)?q\\.gz$'
fastq_2:
type: string
format: file-path
pattern: '^\\S+\\.f(ast)?q\\.gz$'
run_accession:
type: string
experiment_accession:
type: string
sample_accession:
type: string
secondary_sample_accession:
type: string
study_accession:
type: string
secondary_study_accession:
type: string
submission_accession:
type: string
run_alias:
type: string
experiment_alias:
type: string
sample_alias:
type: string
study_alias:
type: string
library_layout:
type: string
library_selection:
type: string
library_source:
type: string
library_strategy:
type: string
library_name:
type: string
instrument_model:
type: string
instrument_platform:
type: string
base_count:
type: integer
read_count:
type: integer
tax_id:
type: string
scientific_name:
type: string
sample_title:
type: string
experiment_title:
type: string
study_title:
type: string
sample_description:
type: string
fastq_md5:
type: string
pattern: '^[0-9a-f]{32}$'
fastq_bytes:
type: integer
fastq_ftp:
type: string
fastq_galaxy:
type: string
fastq_aspera:
type: string
6 changes: 0 additions & 6 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,6 @@ process {
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }

publishDir = [
path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'
Expand Down
34 changes: 34 additions & 0 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
*/

nextflow.enable.dsl = 2
nextflow.preview.topic = true

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -34,6 +35,13 @@ workflow NFCORE_FETCHNGS {
//
SRA ( ids )

emit:
samplesheet = SRA.out.samplesheet
mappings = SRA.out.mappings
sample_mappings = SRA.out.sample_mappings
sra_metadata = SRA.out.sra_metadata
versions = SRA.out.versions

}

/*
Expand Down Expand Up @@ -83,6 +91,32 @@ workflow {
)
}

output {
directory params.outdir, mode: params.publish_dir_mode

'fastq' {
from 'fastq'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so the first fastq is the path, and the second is the topic.
Can it be a bit more explicit?

Thinking something like this at least:

Suggested change
'fastq' {
from 'fastq'
}
'fastq/' {
from 'fastq'
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can specify a trailing slash if you want

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like that, it makes the path more explicit, but I'd rather even have a path and a topic specified somewhere, not to be too confused, I think it's better to be a bit more explicit

Copy link
Author

@bentsherman bentsherman Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. This case is simple enough to be confusing, but if you used a regular emit instead of a topic, or you had a more complex directory structure like in rnaseq, it would be clearer.

I did propose calling it fromTopic to help denote it as a topic, but Paolo was in favor of not having too many different keywords. We could revisit this


'fastq/md5' {
from 'md5'
}

'metadata' {
from 'runinfo-tsv'
}

'pipeline_info' {
from 'versions-yml'
}

'samplesheet' {
from NFCORE_FETCHNGS.out.samplesheet // , schema: 'assets/schema_samplesheet.yml'
from NFCORE_FETCHNGS.out.mappings // , schema: 'assets/schema_mappings.yml'
bentsherman marked this conversation as resolved.
Show resolved Hide resolved
from NFCORE_FETCHNGS.out.sample_mappings
}
}

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
THE END
Expand Down
12 changes: 0 additions & 12 deletions modules/local/aspera_cli/nextflow.config
Original file line number Diff line number Diff line change
@@ -1,17 +1,5 @@
process {
withName: 'ASPERA_CLI' {
ext.args = '-QT -l 300m -P33001'
publishDir = [
[
path: { "${params.outdir}/fastq" },
mode: params.publish_dir_mode,
pattern: "*.fastq.gz"
],
[
path: { "${params.outdir}/fastq/md5" },
mode: params.publish_dir_mode,
pattern: "*.md5"
]
]
}
}
9 changes: 0 additions & 9 deletions modules/local/multiqc_mappings_config/nextflow.config

This file was deleted.

12 changes: 0 additions & 12 deletions modules/local/sra_fastq_ftp/nextflow.config
Original file line number Diff line number Diff line change
@@ -1,17 +1,5 @@
process {
withName: 'SRA_FASTQ_FTP' {
ext.args = '-t 5 -nv -c -T 60'
publishDir = [
[
path: { "${params.outdir}/fastq" },
mode: params.publish_dir_mode,
pattern: "*.fastq.gz"
],
[
path: { "${params.outdir}/fastq/md5" },
mode: params.publish_dir_mode,
pattern: "*.md5"
]
]
}
}
8 changes: 0 additions & 8 deletions modules/local/sra_ids_to_runinfo/nextflow.config

This file was deleted.

9 changes: 0 additions & 9 deletions modules/local/sra_runinfo_to_ftp/nextflow.config

This file was deleted.

8 changes: 0 additions & 8 deletions modules/local/sra_to_samplesheet/nextflow.config

This file was deleted.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 0 additions & 5 deletions modules/nf-core/sratools/fasterqdump/nextflow.config

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 0 additions & 8 deletions modules/nf-core/sratools/prefetch/nextflow.config

This file was deleted.

16 changes: 16 additions & 0 deletions output.yml
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this top-level schema file could be automatically generated from the output DSL (e.g. with a nextflow command)

Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
$schema: 'http://json-schema.org/draft-07/schema'
$id: 'https://raw.githubusercontent.com/nf-core/fetchngs/master/output.yml'
title: 'nf-core/fetchngs pipeline outputs'
description: ''
type: object
properties:
id_mappings:
type: string
format: file-path
mimetype: text/csv
schema: assets/schema_mappings.yml
samplesheet:
type: string
format: file-path
mimetype: text/csv
schema: assets/schema_samplesheet.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 12 additions & 3 deletions workflows/sra/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ workflow SRA {
.fastq
.mix(SRA_FASTQ_FTP.out.fastq)
.mix(FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS.out.reads)
.tap { ch_fastq }
.map {
meta, fastq ->
def reads = fastq instanceof List ? fastq.flatten() : [ fastq ]
Expand Down Expand Up @@ -153,7 +154,7 @@ workflow SRA {
.map { it[1] }
.collectFile(name:'tmp_samplesheet.csv', newLine: true, keepHeader: true, sort: { it.baseName })
.map { it.text.tokenize('\n').join('\n') }
.collectFile(name:'samplesheet.csv', storeDir: "${params.outdir}/samplesheet")
.collectFile(name:'samplesheet.csv')
.set { ch_samplesheet }

SRA_TO_SAMPLESHEET
Expand All @@ -162,7 +163,7 @@ workflow SRA {
.map { it[1] }
.collectFile(name:'tmp_id_mappings.csv', newLine: true, keepHeader: true, sort: { it.baseName })
.map { it.text.tokenize('\n').join('\n') }
.collectFile(name:'id_mappings.csv', storeDir: "${params.outdir}/samplesheet")
.collectFile(name:'id_mappings.csv')
.set { ch_mappings }

//
Expand All @@ -181,7 +182,15 @@ workflow SRA {
// Collate and save software versions
//
softwareVersionsToYAML(ch_versions)
.collectFile(storeDir: "${params.outdir}/pipeline_info", name: 'nf_core_fetchngs_software_mqc_versions.yml', sort: true, newLine: true)
.collectFile(name: 'nf_core_fetchngs_software_mqc_versions.yml', sort: true, newLine: true)
.set { ch_versions_yml }

topic:
SRA_RUNINFO_TO_FTP.out.tsv >> 'runinfo-tsv'
ch_fastq >> 'fastq'
ASPERA_CLI.out.md5 >> 'md5'
SRA_FASTQ_FTP.out.md5 >> 'md5'
ch_versions_yml >> 'versions-yml'

emit:
samplesheet = ch_samplesheet
Expand Down
5 changes: 0 additions & 5 deletions workflows/sra/nextflow.config
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
includeConfig "../../modules/local/multiqc_mappings_config/nextflow.config"
includeConfig "../../modules/local/aspera_cli/nextflow.config"
includeConfig "../../modules/local/sra_fastq_ftp/nextflow.config"
includeConfig "../../modules/local/sra_ids_to_runinfo/nextflow.config"
includeConfig "../../modules/local/sra_runinfo_to_ftp/nextflow.config"
includeConfig "../../modules/local/sra_to_samplesheet/nextflow.config"
includeConfig "../../modules/nf-core/sratools/prefetch/nextflow.config"
includeConfig "../../subworkflows/nf-core/fastq_download_prefetch_fasterqdump_sratools/nextflow.config"