-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Iterative Mode Hangs #198
Comments
Hi, have there been any updates on this? I'm also running into the same error, and would like to know if replacing 0s with 1s is an appropriate solution, or if there are any new recomendations. |
@tboen1 Have you managed to fix this ? I have the same issue:
|
Also running into this though it's not really hanging per se, it's clearly still chugging away as cpu and ram usage fluctuates on the process, it's more that it's just taking an intractable amount of time to complete even one iteration. Not sure how exactly the algorithm scales relative to input size but my hunch is my input is too large at 27,403 x 16,372 for my counts df. I don't really have a fix code-wise as I'm not familiar enough with the various parts of the DESeq2 algorithm at the moment, maybe the maintainers will have more insight into the time consuming steps of the iterative method (@BorisMuzellec)? But to the suggestions of getting rid of the 0s, I've thought of a few methods to test:
Of course all of these will change the result values, however they do differ in the magnitude of change when testing on a small toy dataset. Still which one is least incorrect may be dependent on the experiment/use case. In my context, I think excluding samples is probably the most valid, as I'm lucky to not need to exclude too many samples. In my data, the gene with the fewest 0s has 15 0s. Having ~27000 samples where my groups are also >> 15 samples, excluding these 15 doesn't seem too bad to me for computation tractability. |
The Issue: Iterative Mode Hangs
RuntimeWarning: Every gene contains at least one zero, cannot compute log geometric means. Switching to iterative mode.
But iterative mode just gets stucked, without completing:
Fitting dispersions...
done in 1.39 seconds.
Fitting MAP dispersions...
done in 1.40 seconds.
- gets stucked here -
To Reproduce
Python 3.10, clean install with pip install pydeseq2. Using PyCharm IDE.
Create DDS
dds = DeseqDataSet(counts=x_train_count,
metadata=metadata,
design_factors='status',
refit_cooks=True)
Run deseq2
dds.deseq2()
RuntimeWarning shows up switching to iterative mode but gets stucked after fitting MAP dispersions.
Expected behavior
Iterative mode completes without getting stucked.
Screenshots
Desktop (please complete the following information):
Additional context
Tried executing in Jupyter notebook but same issue comes up. I suspect there's a broken piece of code somewhere.
I replaced all nan with 1: x_train_count.fillna(1) and was able to run dds.deseq2(), with most genes returning nan after DeseqStats(dds) but still useable. However, I am not sure about the effect of replacing zero/nan with 1s in analysis. Preferably, either iterative mode or standard mode can run with nan or zeros.
Update: tried running it in VS Code under terminal and interactive window but same issue, it gets stucked.
The text was updated successfully, but these errors were encountered: