- PDNS-Net: A Large Heterogeneous Graph Benchmark Dataset of Network Resolutions for Graph LearningUdesh Kumarasinghe, Fatih Deniz, and Mohamed NabeelMar 2022
In order to advance the state of the art in graph learning algorithms, it is necessary to construct large real-world datasets. While there are many benchmark datasets for homogeneous graphs, only a few of them are available for heterogeneous graphs. Furthermore, the latter graphs are small in size rendering them insufficient to understand how graph learning algorithms perform in terms of classification metrics and computational resource utilization. We introduce, PDNS-Net, the largest public heterogeneous graph dataset containing 447K nodes and 897K edges for the malicious domain classification task. Compared to the popular heterogeneous datasets IMDB and DBLP, PDNS-Net is 38 and 17 times bigger respectively. We provide a detailed analysis of PDNS-Net including the data collection methodology, heterogeneous graph construction, descriptive statistics and preliminary graph classification performance. The dataset is publicly available at this https URL. Our preliminary evaluation of both popular homogeneous and heterogeneous graph neural networks on PDNS-Net reveals that further research is required to improve the performance of these models on large heterogeneous graphs.
- HeteroGuard: Defending Heterogeneous Graph Neural Networks against Adversarial AttacksUdesh Kumarasinghe, Mohamed Nabeel, Kasun De Zoyza, and 2 more authorsIn 2022 International Conference on Data Mining Workshops (ICDMW) Dec 2022
Graph neural networks (GNNs) have achieved remarkable success in many application domains including drug discovery, program analysis, social networks, and cyber security. However, it has been shown that they are not robust against adversarial attacks. In the recent past, many adversarial attacks against homogeneous GNNs and defenses have been proposed. However, most of these attacks and defenses are ineffective on heterogeneous graphs as these algorithms optimize under the assumption that all edge and node types are of the same and further they introduce semantically incorrect edges to perturbed graphs. Here, we first develop, HetePR-BCD, a training time (i.e. poisoning) adversarial attack on heterogeneous graphs that outperforms the start of the art attacks proposed in the literature. Our experimental results on three benchmark heterogeneous graphs show that our attack, with a small perturbation budget of 15%, degrades the performance up to 32% (F1 score) compared to existing ones. It is concerning to mention that existing defenses are not robust against our attack. These defenses primarily modify the GNN’s neural message passing operators assuming that adversarial attacks tend to connect nodes with dissimilar features, but this assumption does not hold in heterogeneous graphs. We construct HeteroGuard, an effective defense against training time attacks including HetePR-BCD on heterogeneous models. HeteroGuard outperforms the existing defenses by 3-8% on F1 score depending on the benchmark dataset.