You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an issue with the dataset provided at this link: https://zenodo.org/records/7928396. Specifically, after running the script python scripts/prepare_data.py --data_path examples/data_ir.pkl --output_path examples/train, the tgt.txt files generated contain aromatic SMILES strings that appear to be non-standard.
When attempting to convert these SMILES strings to SELFIES strings using the selfies library, errors occur. The problem seems to originate from the "[c]" in the SMILES strings, which does not seem to be a standard representation. For example, manually modifying C=C(C)C(=O)Oc1cc[c]cc1 to C=C(C)C(=O)Oc1ccccc1 resolves the issue and allows the conversion to proceed normally.
I suspect there might be an issue with the script used for generating IR spectra, particularly in how it handles aromatic rings.
I appreciate your attention to this matter and look forward to your response.
The text was updated successfully, but these errors were encountered:
Hello,
Thank you for your work.
I encountered an issue with the dataset provided at this link: https://zenodo.org/records/7928396. Specifically, after running the script python scripts/prepare_data.py --data_path examples/data_ir.pkl --output_path examples/train, the tgt.txt files generated contain aromatic SMILES strings that appear to be non-standard.
When attempting to convert these SMILES strings to SELFIES strings using the selfies library, errors occur. The problem seems to originate from the "[c]" in the SMILES strings, which does not seem to be a standard representation. For example, manually modifying C=C(C)C(=O)Oc1cc[c]cc1 to C=C(C)C(=O)Oc1ccccc1 resolves the issue and allows the conversion to proceed normally.
I suspect there might be an issue with the script used for generating IR spectra, particularly in how it handles aromatic rings.
I appreciate your attention to this matter and look forward to your response.
The text was updated successfully, but these errors were encountered: