Inconsistent prediction: pred in logger vs pred from .predict function #110

chpoonag · 2024-11-05T04:20:00Z

I have got a GAE model trained with data pyg_graph_train.
Then, I use pyg_graph_test for model prediction.

I tried this:
pred, score = model.predict( pyg_graph_test, label = pyg_graph_test.label, return_score=True )
And I got "Recall 0.7490 | Precision 0.7490 | AP 0.6226 | F1 0.7490"

But when I check the pred and score:
f1_score(y_true=pyg_graph_test.label, y_pred=pred)
I got 0.34680888045878483, which is inconsistent.

I found that the returned pred from the predict function is not the same as that in the logger function (pygod.utls.utility), because of different threshold values.
In the logger function:
contamination = sum(target) / len(target)
threshold = np.percentile(score, 100 * (1 - contamination))
pred = (score > threshold).long()

In contrast, in the predict function (pygod.detector.base):
if return_pred:
pred = (score > self.threshold_).long()
The "self.threshold_" is determined in _process_decision_score as:
self.threshold_ = np.percentile(self.decision_score_, 100 * (1 - self.contamination))

So, which prediction (i.e. which threshold value) is correct? Or is there something I may have missed/overlooked instead?

The text was updated successfully, but these errors were encountered:

kayzliu · 2024-11-08T02:51:19Z

Sorry for the confusion.

If you do have the label or you know exactly how many outliers are in the dataset, e.g., 15%, you can specify the contamination in the initialization of the detector, for example model = DOMINANT(contamination=0.15). The model will make the binary prediction pred based on this contamination.

However, in many cases, our user does not have any label. We set a default contamination to 0.1. The threshold is changed correspondingly. That's why you got ~0.3 for F1. The ~0.7 F1 is evaluated with labels, which means the contamination is set to an ideal value.

To avoid setting the threshold, we also provide AUC, AP, and Recall@k for easier evaluation.

withMoonstar · 2024-11-27T00:57:02Z

Hello. I've also been using GAE for anomaly detection recently. However, errors have been constantly reported during the import process. Could I refer to your usage code?

The following is my error message. Thank you very much.

RuntimeError: pyg::neighbor_sample() Expected a value of type 'Optional[Tensor]' for argument 'edge_weight' but instead found type 'bool'.

The following is my code:

from pygod.detector import GAE
from pygod.utils import load_data
from sklearn.metrics import roc_auc_score, average_precision_score

Function to train the anomaly detector

def train_anomaly_detector(model, graph):
return model.fit(graph)

Function to evaluate the anomaly detector

def eval_anomaly_detector(model, graph):
outlier_scores = model.decision_function(graph)
auc = roc_auc_score(graph.y.numpy(), outlier_scores)
ap = average_precision_score(graph.y.numpy(), outlier_scores)
print(f'AUC Score: {auc:.3f}')
print(f'AP Score: {ap:.3f}')

graph = load_data('weibo')

Initialize and evaluate the model

graph.y = graph.y.bool()

if hasattr(graph, 'edge_weight'):
graph.edge_weight = None

model = GAE(epoch=100)
model = train_anomaly_detector(model, graph)
eval_anomaly_detector(model, graph)

chpoonag changed the title ~~Inconsistent prediction~~ Inconsistent prediction: pred in logger vs pred from .predict function Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent prediction: pred in logger vs pred from .predict function #110

Inconsistent prediction: pred in logger vs pred from .predict function #110

chpoonag commented Nov 5, 2024 •

edited

Loading

kayzliu commented Nov 8, 2024

withMoonstar commented Nov 27, 2024

Inconsistent prediction: pred in logger vs pred from .predict function #110

Inconsistent prediction: pred in logger vs pred from .predict function #110

Comments

chpoonag commented Nov 5, 2024 • edited Loading

kayzliu commented Nov 8, 2024

withMoonstar commented Nov 27, 2024

Function to train the anomaly detector

Function to evaluate the anomaly detector

Initialize and evaluate the model

chpoonag commented Nov 5, 2024 •

edited

Loading