No genes with qglobal_cv < 0.1 #69

kvn95ss · 2021-08-23T07:22:45Z

Hello,

I ran this data set on filtered output from Mutect2 (tumor vs normal, single patient with PoN of 4 samples). I got the mutation list by querying the vcf file from bcftools so I get the columns sampleID, chr, pos, ref and mut.

I'm using hg38 reference from the precomputed rdna file in this repo - https://github.com/im3sanger/dndscv_data/tree/master/data

I'm able to get dndscv running for my data by using these commands
cancer_test <- read.table("CC028_dmg_test.vcf")
cancer_processed_data = dndscv(cancer_test, ref_db="data/RefCDS_human_GRCh38.p12.rda", cv=NULL)
sel_cv = cancer_processed_data$sel_cv;print(head(sel_cv), digits = 3)
I get this output -

      gene_name n_syn n_mis n_non n_spl n_ind wmis_cv wnon_cv wspl_cv wind_cv
8821   KRTAP5-4     0     1     0     0     2    46.7       0       0     772
16565   TAS2R30     0     2     0     0     1    61.4       0       0     276
7412      HLA-C     0     2     0     0     1    33.4       0       0     237
13331      PSG3     0     2     0     0     1    36.7       0       0     186
9056     LILRA4     0     2     0     0     1    31.2       0       0     177
18255  USP17L18     0     1     2     0     0    13.5     357     357       0
       pmis_cv ptrunc_cv pallsubs_cv  pind_cv qmis_cv qtrunc_cv qallsubs_cv
8821  0.016677  9.61e-01    5.69e-02 7.03e-05   0.897     0.983       0.996
16565 0.000788  9.54e-01    3.55e-03 3.48e-03   0.897     0.983       0.996
7412  0.002592  9.25e-01    1.06e-02 4.03e-03   0.897     0.983       0.996
13331 0.002184  9.08e-01    8.98e-03 5.08e-03   0.897     0.983       0.996
9056  0.002970  9.14e-01    1.19e-02 5.32e-03   0.897     0.983       0.996
18255 0.067056  1.96e-05    6.39e-05 1.00e+00   0.897     0.383       0.996
      pglobal_cv qglobal_cv
8821    5.37e-05          1
16565   1.52e-04          1
7412    4.71e-04          1
13331   5.01e-04          1
9056    6.77e-04          1
18255   6.81e-04          1

But when looking for significant genes, I get no output
print(cancer_processed_data$sel_cv[cancer_processed_data$sel_cv$qglobal_cv<0.1, c("gene_name","qglobal_cv")])

<0 rows> (or 0-length row.names)

What could be the reason for this? Does this imply there are no significant genes in the data?

The text was updated successfully, but these errors were encountered:

shaghayeghsoudi · 2022-10-06T23:32:22Z

Hey, have you been able to find an answer for your question? I am running into the exact same problem and getting no hit. Thanks

im3sanger · 2022-10-12T16:15:10Z

Hello,

Sorry for the very late reply.

Yes, this means that there are no recurrently mutated genes in your dataset reaching statistical significance. Can you explain your experimental design in more detail? From your earlier description it sounds like you are analysing data from a single patient. Is that correct? In that case it would not be unexpected not to find any significant recurrence, as this relies on finding mutations in the same gene across multiple samples or patients.

Inigo

shaghayeghsoudi · 2022-10-21T00:38:24Z

Hi Inigo, thanks for your reply. I am indeed working on 27 WES sarcoma tumours. They are multi regional and for each tumour I have 3-6 regions sampled and sequenced which I am merging them into one for each tumour by removing duplicate mutations. I was expecting to find at least a few hits as sarcomas are not normally SSMs type of tumours but I am getting all q-values equal to one, nothing significant.

im3sanger · 2022-10-21T07:53:59Z

Hello,

Thank you. Apologies, I had not realised that there were questions from separate users.

Can you confirm what value of theta you are getting? (dndsout$nbreg$theta).

Lack of significance can be caused by datasets that are too small or that do not have sufficient recurrence for any gene to reach significance. However, it is always important to check that your theta value is not very low (<<1). Very low theta values mean that there is very high variation in the density of synonymous mutations across genes. This typically reflects problems with the mutation calls, such as recurrent artefacts or SNP contamination. Large variation in the density of mutations across genes (high overdispersion) makes dNdScv be more conservative (a gene needs to have more mutations to emerge from the noise) and results in less significance.

If your dataset has good theta values (>1, or ideally >3) and your mutation calls are reliable, then the lack of significance may reflect insufficient power (small datasets or insufficient recurrence).

Best,
Inigo

ym-chen · 2024-08-01T01:23:24Z

Hello,

Thank you. Apologies, I had not realised that there were questions from separate users.

Can you confirm what value of theta you are getting? (dndsout$nbreg$theta).

Lack of significance can be caused by datasets that are too small or that do not have sufficient recurrence for any gene to reach significance. However, it is always important to check that your theta value is not very low (<<1). Very low theta values mean that there is very high variation in the density of synonymous mutations across genes. This typically reflects problems with the mutation calls, such as recurrent artefacts or SNP contamination. Large variation in the density of mutations across genes (high overdispersion) makes dNdScv be more conservative (a gene needs to have more mutations to emerge from the noise) and results in less significance.

If your dataset has good theta values (>1, or ideally >3) and your mutation calls are reliable, then the lack of significance may reflect insufficient power (small datasets or insufficient recurrence).

Best, Inigo

Hello，

I encountered the same issue. My samples come from multiple tissue sites of several patients. I used MuTect2 to obtain a set of somatic variants. However, when I used dNdScv to look for driver genes, the qglobal_cv for all genes is close to 1. I noticed that the result shows θ=3.881757. I find this quite confusing.

im3sanger · 2024-08-01T08:09:54Z

Hello ym-chen,

Thanks for your message. Could you clarify how many samples you are analysing here? Lack of significance does not necessarily mean that there are not drivers in your dataset but that there is not enough evidence to reach statistical significance. This could be due to insufficient power if your dataset is too small.

Best,
Inigo

ym-chen · 2024-12-17T05:36:10Z

@im3sanger Sorry for the late reply. I have 11 individuals, each with about 30 microdissection sites. Each site underwent WES sequencing. In my analysis, I treated the individuals as units. I suspect that the reason I couldn't obtain significant results is that I filtered out too many mutations while selecting reliable mutation sites in the preliminary phase, which resulted in very scattered data.

im3sanger closed this as completed Oct 12, 2022

im3sanger reopened this Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No genes with qglobal_cv < 0.1 #69

No genes with qglobal_cv < 0.1 #69

kvn95ss commented Aug 23, 2021

shaghayeghsoudi commented Oct 6, 2022

im3sanger commented Oct 12, 2022

shaghayeghsoudi commented Oct 21, 2022

im3sanger commented Oct 21, 2022

ym-chen commented Aug 1, 2024

im3sanger commented Aug 1, 2024

ym-chen commented Dec 17, 2024

No genes with qglobal_cv < 0.1 #69

No genes with qglobal_cv < 0.1 #69

Comments

kvn95ss commented Aug 23, 2021

shaghayeghsoudi commented Oct 6, 2022

im3sanger commented Oct 12, 2022

shaghayeghsoudi commented Oct 21, 2022

im3sanger commented Oct 21, 2022

ym-chen commented Aug 1, 2024

im3sanger commented Aug 1, 2024

ym-chen commented Dec 17, 2024