StringTie prepDE.py script needs length argument for calculating raw counts #266

JohnHadish · 2022-11-14T22:59:46Z

Description of the bug

Impacts Hisat2 and Star aligner.
The prepDE.py script is used to calculate raw counts. It has a parameter called -l (length) which requires the user to input the average length of each read. It defaults to 75 bp. Currently, GEMmaker does not reset this parameter. This will result in incorrect raw read counts being reported to the user.

the code for prepDE.py

http://ccb.jhu.edu/software/stringtie/dl/prepDE.py3

the code which needs to be changed:

https://github.com/SystemsGenetics/GEMmaker/blob/master/modules/local/stringtie_fpkm_tpm.nf

Suggested change:
pass an argument with readlength to stringtie, use this parameter

Command used and terminal output

No response

Relevant files

No response

System information

No response

The text was updated successfully, but these errors were encountered:

spficklin · 2022-11-15T00:12:05Z

This is a problem and should be fixed, but I don't think it means GEMmaker is delivering counts that are not "correct". Even if the average read length was given properly (not the 75bp default), the count is still technically not "correct". It's an estimate using the formula reads_per_transcript = coverage * transcript_len / read_len. For example, for trimmed reads you can still have those that are shorter or longer than the average so the count won't be perfect.

I think the problem isn't with the counts being incorrect, it's more of a challenge of being able to compare across samples that have different average read lengths.

If your input files are all from the same experiment odds are all of the read lengths across the samples will be the same -- no problem.

If your input files are from samples that have different read lengths then a cross-sample normalization, I think, should clear things up. Folks with larger datasets should be checking for this and normalizing.

if your sample file is absolutely huge, then it's hard to normalize and there may be some comparison problems. But correlation analysis should still be okay (with some added noise).

JohnHadish added the bug label Nov 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StringTie prepDE.py script needs length argument for calculating raw counts #266

StringTie prepDE.py script needs length argument for calculating raw counts #266

JohnHadish commented Nov 14, 2022

spficklin commented Nov 15, 2022

StringTie prepDE.py script needs length argument for calculating raw counts #266

StringTie prepDE.py script needs length argument for calculating raw counts #266

Comments

JohnHadish commented Nov 14, 2022

Description of the bug

Command used and terminal output

Relevant files

System information

spficklin commented Nov 15, 2022