Skip to content

Commit

Permalink
Merge branch 'dev' into add_fq_lint_to_pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
adamrtalbot authored Dec 17, 2024
2 parents ad5fd9b + 83539bd commit aad2abd
Show file tree
Hide file tree
Showing 34 changed files with 968 additions and 41 deletions.
16 changes: 4 additions & 12 deletions .github/workflows/nf-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,13 @@ on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3, 4]

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set up JDK 11
uses: actions/setup-java@v2
with:
java-version: "11"
distribution: "adopt"

- name: Setup Nextflow latest-edge
uses: nf-core/setup-nextflow@v1
uses: nf-core/setup-nextflow@v2
with:
version: "latest-edge"

Expand All @@ -28,5 +20,5 @@ jobs:
wget -qO- https://get.nf-test.com | bash
sudo mv nf-test /usr/local/bin/
- name: Run Tests (Shard ${{ matrix.shard }}/${{ strategy.job-total }})
run: nf-test test --ci --shard ${{ matrix.shard }}/${{ strategy.job-total }} .
- name: Run Tests
run: nf-test test --ci tests
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
- [#20](https://github.com/nf-core/seqinspector/pull/20) Use tags to generate group reports
- [#13](https://github.com/nf-core/seqinspector/pull/13) Generate reports per run, per project and per lane.
- [#49](https://github.com/nf-core/seqinspector/pull/49) Merge with template 3.0.2.
- [#56](https://github.com/nf-core/seqinspector/pull/56) Added SeqFu stats module.
- [#50](https://github.com/nf-core/seqinspector/pull/50) Add an optional subsampling step.
- [#51](https://github.com/nf-core/seqinspector/pull/51) Add nf-test to CI.
- [#63](https://github.com/nf-core/seqinspector/pull/63) Contribution guidelines added about displaying results for new tools
- [#67](https://github.com/nf-core/seqinspector/pull/67) Add FASTQ linting for early validation
- [#53](https://github.com/nf-core/seqinspector/pull/53) Add FastQ-Screen database multiplexing and limit scope of nf-test in CI.

### `Fixed`

Expand Down
8 changes: 8 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@

> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
- [SeqFu](https://telatin.github.io/seqfu2/)

> Telatin A, Fariselli P, Birolo G. SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files. Bioengineering 2021, 8, 59. doi.org/10.3390/bioengineering8050059
- [FastQ Screen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/)

> Wingett SW and Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control [version 2; referees: 4 approved]. F1000Research 2018, 7:1338 (https://doi.org/10.12688/f1000research.15931.2)
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
Expand Down
4 changes: 4 additions & 0 deletions assets/example_fastq_screen_references.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
name,dir,basename,aligner
Ecoli,s3://ngi-igenomes/igenomes/Escherichia_coli_K_12_MG1655/NCBI/2001-10-15/Sequence/Bowtie2Index/,genome,bowtie2
PhiX,s3://ngi-igenomes/igenomes/PhiX/Illumina/RTA/Sequence/Bowtie2Index/,genome,bowtie2
Scerevisiae,s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/NCBI/build3.1/Sequence/Bowtie2Index/,genome,bowtie2
35 changes: 35 additions & 0 deletions assets/schema_fastq_screen_references.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://raw.githubusercontent.com/nf-core/seqinspector/master/assets/schema_fastq_screen_references.json",
"title": "nf-core/seqinspector pipeline - params.fastq_screen_references schema",
"description": "Schema for the file provided with params.fastq_screen_references",
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "The reference name as referred to by FastQ Screen."
},
"dir": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+$",
"errorMessage": "Path to the dir containing the aligner reference and index. Can be remote."
},
"basename": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "The shared basename of the reference and index files contained in the dir."
},
"aligner": {
"type": "string",
"enum": ["bowtie", "bowtie2", "bwa", "minimap2"],
"errorMessage": "Specify the aligner to use for the mapping. Valid arguments are 'bowtie', bowtie2' (default), 'bwa' or 'minimap2'."
}
},
"required": ["name", "dir", "basename", "aligner"]
}
}
9 changes: 9 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,15 @@ process {
ext.args = '--quiet'
}

withName: 'SEQFU_STATS' {
ext.args = ''
publishDir = [
path: { "${params.outdir}/seqfu_stats" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'MULTIQC_GLOBAL' {
ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
publishDir = [
Expand Down
1 change: 1 addition & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ params {
// TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
// TODO nf-core: Give any required params for the test so that command line flags are not needed
input = params.pipelines_testdata_base_path + 'seqinspector/testdata/NovaSeq6000/samplesheet.csv'
fastq_screen_references = "${projectDir}/assets/example_fastq_screen_references.csv"

// Genome references
genome = 'R64-1-1'
Expand Down
40 changes: 40 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

- [Seqtk](#seqtk) - Subsample a specific number of reads per sample
- [FastQC](#fastqc) - Raw read QC
- [SeqFu Stats](#seqfu_stats) - Statistics for FASTA or FASTQ files
- [Fastqscreen](#fastqscreen) - mapping against a set of references for basic contamination QC
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

Expand Down Expand Up @@ -40,6 +42,44 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).

### FASTQSCREEN

<details markdown="1">
<summary>Output files</summary>

- `fastqscreen/`
- `*_screen.html`: Interactive graphical fastqscreen report which summaries the mapping of your sequences against each of your libraries.
- `*_screen.pdf`: Static graphical fastqscreen report which summaries the mapping of your sequences against each of your libraries.
- `*_screen.txt` : text based fastqscreen report which summaries the mapping of your sequences against each of your libraries.

</details>

[Fastqscreen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/) allows you to set up a standard set of libraries against which all of your sequences can be searched. Your search libraries might contain the genomes of all of the organisms you work on, along with PhiX, Vectors or other contaminants commonly seen in sequencing experiments.

It requires a `.csv` detailing:

- the working name of the reference
- the name of the aligner used to generate its index (which is also the aligner and index used by the tool)
- the file basename of the reference and its index (e.g. the reference `genoma.fa` and its index `genome.bt2` have the basename `genome`)
- the path to a dir where the reference and index files both reside.

See `assets/example_fastq_screen_references.csv` for example.

The `.csv` is provided as a pipeline parameter `fastq_screen_references`. The `.csv` is used to construct a `FastQ Screen` configuration file within the context of the process work directory in order to properly mount the references.

### SeqFu Stats

<details markdown="1">
<summary>Output files</summary>

- `seqfu/`
- `*.tsv`: Tab-separated file containing quality metrics.
- `*_mqc.txt`: File containing the same quality metrics as the TSV file, ready to be read by MultiQC.

</details>

[SeqFu](https://telatin.github.io/seqfu2/) is general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.

### MultiQC

nf-core/seqinspector will generate the following MultiQC reports:
Expand Down
11 changes: 11 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,22 @@
"git_sha": "a1abf90966a2a4016d3c3e41e228bfcbd4811ccc",
"installed_by": ["modules"]
},
"fastqscreen/fastqscreen": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"],
"patch": "modules/nf-core/fastqscreen/fastqscreen/fastqscreen-fastqscreen.diff"
},
"multiqc": {
"branch": "master",
"git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d",
"installed_by": ["modules"]
},
"seqfu/stats": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"seqtk/sample": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
Expand Down
14 changes: 14 additions & 0 deletions modules/nf-core/fastqscreen/fastqscreen/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

58 changes: 58 additions & 0 deletions modules/nf-core/fastqscreen/fastqscreen/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit aad2abd

Please sign in to comment.