-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed on different complexes for inference.py on CPU and GPU are different for different batch size #229
Comments
@gcorso Please clarify on above issue |
Could you try again with the most recent version? |
sure |
Hi @jsilter, but for CPU execution we are getting different number of failed cases for different runs. The error pasted below During handling of the above exception, another exception occurred: Traceback (most recent call last): The complexes which are getting failed when we are trying individually its getting executed successfully most of the time and sometimes getting failed also. Can you please let us know the reason behind such variation of output. |
Hi,
I have tried running inference.py on 363 complexes given on testset_csv.csv. But getting failed cases on CPU are 21 complexes (linalg.svd: The algorithm failed to converge because the input matrix contained non-finite values) and
on GPU are 16 complexes (CUDA out of memory)
And 6hlb: which is not available in test data
As our aim to check the CPU and GPU time comparison of the sets which are not showing any failed cases.
So we again tested for CPU by removing 21 complexes but get again 14 failed cases(complexes) and after removing those 14 complexes get 16 complexes failed.
For GPU , after removing 16 complexes we successfully run the rest complexes without any failed cases.
The above experiments for (batch_size=10)
For batch_size = 1, For GPU get 29 failed complexes and for CPU Failed for 30 complexes. Got the error like below
"""" Failed on ['6mjj'] linalg.svd: (Batch element 2): The algorithm failed to converge because the input matrix contained non-finite values.
""""
Please let us know what is the reason behind this as we are getting different failed cases on cpu and gpu and for different batch size
The text was updated successfully, but these errors were encountered: