How can I establish the relation between each graph with its respective News's id? #24
Closed
Humberto-Turioni
started this conversation in
General
Replies: 2 comments 1 reply
-
Hi! If you are still looking for an answer to this, I think I may be able to help (since I needed this for my own use case as well).
Here is a sample code of how to get the IDs. The comments should explain the steps: import numpy as np
import pickle
from torch_geometric.datasets import UPFD
with open("<your-path>/pol_id_twitter_mapping.pkl", "rb") as f:
id_mapping = pickle.load(f)
# the id_indices (keys) should be in ascending order:
assert np.all(np.array([k for k in id_mapping]) == np.arange(len(id_mapping)))
# therefore the id_values are also iterated in the correct order
# now we just need to find root ids, we can abuse the fact that
# root node ids are always strings, while the twitter ids are integers
graph_ids: list[list[str]] = []
for v in id_mapping.values():
try:
_ = int(v)
graph_ids[-1].append(v)
except ValueError:
graph_ids.append([v])
root_ids: list[str] = [i[0] for i in graph_ids]
# alternatively (maybe less "hacky", but involves more steps and another file) we could use
# <your-path>/politifact/raw/node_graph_idx.npy
# then find the first node (root) index of each graph
# then use the root indices as keys in id_mapping to get the root ids
# now we have the root ids, and can index the training graph positions
train_indices = np.load("<your-path>/politifact/raw/train_idx.npy")
train_root_ids = [root_ids[i] for i in train_indices]
train_all_ids = {root_ids[i]: graph_ids[i] for i in train_indices}
# done! you should now have what you were looking for
# as extra insurance that this is the correct mapping, you can check that the graph sizes match the
# number of ids that we collected for each graph above:
train_dataset = UPFD("<your-path>", "politifact", "content", "train")
assert len(train_dataset) == len(train_root_ids)
for graph_id, graph in zip(train_root_ids, train_dataset):
assert len(train_all_ids[graph_id]) == graph.x.shape[0]
print(graph_id) |
Beta Was this translation helpful? Give feedback.
1 reply
-
Thank you Yingtong and Philipp! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
I have faced difficulties to establish the releation graphs and News's id.
For example,
train_data = UPFD(root="C:\Users\user\Desktop\execucao", name="gossipcop", feature="content", split="train")
In the train_data[x] I couldn't estabilish the relation with gos_news_list.txt.
I want to know which news each graph belongs
Beta Was this translation helpful? Give feedback.
All reactions