Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subsampling #50

Merged
merged 20 commits into from
Oct 30, 2024
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
- [#20](https://github.com/nf-core/seqinspector/pull/20) Use tags to generate group reports
- [#13](https://github.com/nf-core/seqinspector/pull/13) Generate reports per run, per project and per lane.
- [#49](https://github.com/nf-core/seqinspector/pull/49) Merge with template 3.0.2.
- [#50](https://github.com/nf-core/seqinspector/pull/50) Add an optional subsampling step.
- [#51](https://github.com/nf-core/seqinspector/pull/51) Add nf-test to CI.

### `Fixed`
Expand Down
2 changes: 2 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [Seqtk](https://github.com/lh3/seqtk)

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,9 @@
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->

1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
1. Subsample reads ([`Seqtk`](https://github.com/lh3/seqtk))
2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
3. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))

## Usage

Expand Down
4 changes: 4 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

withName: SEQTK_SAMPLE {
ext.args = '-s100'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no opinion on whether the pipeline should publish the subsampled files. But if you want to output them in the way you describe in the output.md, you will probably need to define a corresponding publishDir directive here? (There is also a new possibility called Workflow Output Schema, that I am however not yet familiar with. Did you use that without me noticing it?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is always output because there is a general statement here:

publishDir = [
, you mean something different? I think is nice to have the reads that have been actually used in the rest of the analysis


withName: FASTQC {
ext.args = '--quiet'
}
Expand Down
13 changes: 13 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,23 @@ The directories listed below will be created in the results directory after the

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

- [Seqtk](#seqtk) - Subsample a specific number of reads per sample
- [FastQC](#fastqc) - Raw read QC
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

### Seqtk

<details markdown="1">
<summary>Output files</summary>

- `seqtk/`
- `*_fastq`: FastQ file after being subsampled to the sample_size value.

</details>

[Seqtk](https://github.com/lh3/seqtk) samples sequences by number.

### FastQC

<details markdown="1">
Expand Down
6 changes: 6 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,12 @@ genome: 'GRCh37'

You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).

Optionally, the `sample_size` parameter allows you to subset a random number of reads to be analysed. Note that it refers to an absolute number.

```bash
nextflow run nf-core/seqinspector --input ./samplesheet.csv --outdir ./results --sample_size 1000000 -profile docker
```

### Updating the pipeline

When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@
"branch": "master",
"git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d",
"installed_by": ["modules"]
},
"seqtk/sample": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
}
}
},
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/seqtk/sample/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

58 changes: 58 additions & 0 deletions modules/nf-core/seqtk/sample/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

52 changes: 52 additions & 0 deletions modules/nf-core/seqtk/sample/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

80 changes: 80 additions & 0 deletions modules/nf-core/seqtk/sample/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

95 changes: 95 additions & 0 deletions modules/nf-core/seqtk/sample/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions modules/nf-core/seqtk/sample/tests/standard.config

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions modules/nf-core/seqtk/sample/tests/tags.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading