Adding pairwise GED calculation and simple optimizations #13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've recently used the Graph_Edit_Distance repository to compute the matrix of pairwise GED for the database of compounds. Since this is quite a common application for molecular representation comparison or compound clustering (e.g. using Butina algorithm), I thought it would be really useful to have such functionality.
I implemented it as the third mode next to search and pair. It computes the matrix of pairwise distances between compounds from database file and saves it as a .csv file. This can be run using the following command:
If query file is specified, it is ignored. If results file is not specified, the results are saved in a default location, that is datasets/pairwse_ged.csv.
Additionally, I updated a little bit the code style to be more c++-like. It resulted in lower execution times. For example, execution time for the following:
decreased from 9,263,939 to 8,785,024 microseconds. Maybe that's not an incredible change but for sure better then nothing. And the code seems overally easier to read through, in my humble opinion.
I'd be really glad, if you consider the presented changes. Thanks in advance