-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent prediction: pred in logger vs pred from .predict function #110
Comments
Sorry for the confusion. If you do have the label or you know exactly how many outliers are in the dataset, e.g., 15%, you can specify the However, in many cases, our user does not have any label. We set a default To avoid setting the threshold, we also provide AUC, AP, and Recall@k for easier evaluation. |
Hello. I've also been using GAE for anomaly detection recently. However, errors have been constantly reported during the import process. Could I refer to your usage code? The following is my error message. Thank you very much. RuntimeError: pyg::neighbor_sample() Expected a value of type 'Optional[Tensor]' for argument 'edge_weight' but instead found type 'bool'. The following is my code: from pygod.detector import GAE Function to train the anomaly detectordef train_anomaly_detector(model, graph): Function to evaluate the anomaly detectordef eval_anomaly_detector(model, graph): graph = load_data('weibo') Initialize and evaluate the modelgraph.y = graph.y.bool() if hasattr(graph, 'edge_weight'): model = GAE(epoch=100) |
I have got a GAE model trained with data pyg_graph_train.
Then, I use pyg_graph_test for model prediction.
I tried this:
pred, score = model.predict( pyg_graph_test, label = pyg_graph_test.label, return_score=True )
And I got "Recall 0.7490 | Precision 0.7490 | AP 0.6226 | F1 0.7490"
But when I check the pred and score:
f1_score(y_true=pyg_graph_test.label, y_pred=pred)
I got 0.34680888045878483, which is inconsistent.
I found that the returned pred from the predict function is not the same as that in the logger function (pygod.utls.utility), because of different threshold values.
In the logger function:
contamination = sum(target) / len(target)
threshold = np.percentile(score, 100 * (1 - contamination))
pred = (score > threshold).long()
In contrast, in the predict function (pygod.detector.base):
if return_pred:
pred = (score > self.threshold_).long()
The "self.threshold_" is determined in _process_decision_score as:
self.threshold_ = np.percentile(self.decision_score_, 100 * (1 - self.contamination))
So, which prediction (i.e. which threshold value) is correct? Or is there something I may have missed/overlooked instead?
The text was updated successfully, but these errors were encountered: